Consultant for Data Engineering
Contact for pricing
About this service
Summary
What's included
Project Scope and Locking
A concise document for locking the scope of the project outlining the details, requirements, audience, objective furthermore it will also highlight the initial process that the client and I will be deciding upon. It will also help for referencing in between the project after it has started.
Data Gathering
This step will heavily involve the client and me for discussing where the data will be coming from, how many sources are involved, what is the frequency of the incoming data and volume. It might also involve some preprocessing before moving to the next step.
Designing the pipeline (ETL/ELT)
A scalable, robust pipeline designing that extracts, processes and loads the data. The type of pipeline design (ELT or ETL) will be decided after reviewing the requirements and needs. This will automate the collection and required transformation of data.
Data Warehouse/Lake
Designing a structured data warehouse or lake depends on the data needs mostly or sometimes on clients preference. This will be developed using technologies like Google BigQuery, Amazon Redshift or other similar tools or setting up a custom Data warehouse/lake.
Data Validation and Testing
This is a very important step that encompasses certain checks for data validation including data sanity check, A/B testing etc. It will ensure completeness, integrity, accuracy and consistency of data. This step might be repeated to ensure correctness of data.
Automation and scheduling
This will involve automation of all the workflows. Identifying dependent events, setting up the right time for processing and ingestion of the data so data is readily available at the required frequency (daily, weekly ,monthly) using tools like Airflow, Aws Event Bridge, Linux etc.
Skills and tools
Industries
Work with me