Data Engineer

João Paulo Albuquerque

Data Modelling Analyst

Data Engineer

Apache Spark

AWS

Python

● Worked at one of the largest digital banks in Latin America, Inter Bank, integrating with the

● Anti-Money Laundering (AML) team to address regulatory demands from Brazil's regulatory agency and the USA, where the bank has commenced operations.

● Created and optimized data pipelines within AWS using PySpark for transforming large data volumes, with tables exceeding terabytes in size. The jobs were executed on EMR clusters to handle the heavy processing requirements.

● Developed pipelines to transform data from the ingestion layer to the golden layer, making it available to business teams on a Trino cluster for querying and analysis by the data analysts team.

● Implemented default quality checks by expanding internal frameworks, primarily using Great Expectations, to define minimum data requirements for data ingested into the lake. This ensured data quality and integrity.

● Contributed to the refactoring of legacy data pipeline code written in Python, utilizing Spark (PySpark) as the processing engine. These tasks were executed on a Kubernetes cluster and orchestrated by Airflow, enhancing efficiency and maintainability.

Like this project