Data Engineer by João Paulo AlbuquerqueData Engineer by João Paulo Albuquerque

Data Engineer

João Paulo Albuquerque

Data Modelling Analyst

Data Engineer

AWS

Python

SQL

Developed data pipelines using Azure Databricks within a medallion architecture in a Data Lakehouse, orchestrated by Databricks Workflows, mainly using PySpark and Spark SQL, to improve the efficiency of data processing and analysis for a brokerage services company.

Led, designed, and developed the implementation of data pipelines supporting OpenFinance, integrating sources like SQL Server, Oracle DB, NoSQL Databases and APIs, using Airflow, ensuring 100% regulatory compliance, contributing to a transparent financial ecosystem, and having cost savings, 30% reduction of the initial budget. Enhanced data accessibility for over one million customers.

Participated in the migration of Alteryx ETL pipelines using Python, SQL and Spark. Responsible for creating internal libraries and APIs, conducting queries in the Data Lake via Athena, orchestrating in Airflow, and storing in Parquet files. This strategic shift resulted in significant cost savings, specifically a reduction of R$ 120k and a data lake more robust and mature, reducing time consumption. After the migration, the pipelines took 10 minutes to be processed.

Like this project

Posted Jul 21, 2024

Developed robust data pipelines using Azure Databricks within a medallion architecture in a Data Lakehouse.

Likes

Views