Streamline ETL Processes with Metadata-Driven ApexFlow FrameworkStreamline ETL Processes with Metadata-Driven ApexFlow Framework
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started
ApexFlow — Metadata-Driven ETL Framework This project features a zero-code onboarding architecture that decouples ETL logic from data structures. By utilizing a configuration-first approach, new data sources can be integrated into the production pipeline solely through metadata updates, requiring no manual code changes or DAG modifications.
The Workflow Dynamic Ingestion: A Python engine queries a etl_source_config table in Redshift to identify active tasks, then uses Boto3 to pull CSV/JSON files from specific S3 paths defined in the config.
Automated Processing: The engine dynamically validates and standardizes data based on the metadata schema before loading it into Redshift Staging via psycopg2.
Advanced Modeling: Data flows from Staging to a DWH layer (handling SCD Type 2 history via Stored Procedures) and finally into Datamarts for BI consumption.
Orchestration: Managed by Apache Airflow, the pipeline handles daily scheduling, 2x task retries, and automated failure alerts.
Key Impact Scalability: Drastically reduces "Time-to-Data" by allowing non-developers to onboard new endpoints via config rows.
Resilience: Centralized etl_run_log provides a full audit trail for every automated run.
Efficiency: Eliminates redundant script creation, ensuring a single, hardened codebase manages all data movement.
Post image
Back to feed
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started