anuj66283/tia-etl

Anuj Bhattarai

Data Engineer
Apache Airflow
Docker
Terraform

TIA ETL Pipeline

A ETL pipeline that extract Flight data from Tribhuvan International Airport.

Architecture

Data is extracted from tia
Load to s3
Transform the data
Load the data in RDS

Dashboard

How to use?

Set the values inside .env and terraform/variables.tf
cd terraform
terraform init .
terraform apply
Now you can see ec2 ip and rds endpoint in terminal
terraform output private_key and copy the rsa key into tia_key.pem in project root directory
cd ..
chmod 400 tia_key.pem
Connect to the ec2 instance using ssh -i tia_key.pem {user_name}@{ec2_ip}
sudo snap install docker to install docker in ec2
Username and password used by airflow and local postgres server are hardcoded in docker-compose.yaml
Copy all files and folders to ec2 except terraform
docker-compose build
docker-compose up
Now docker container will run and data will be extracted every 30 minutes and stored in RDS.

To stop

docker-compose down
To remove all the services in aws terrafrom destroy

Why this architecture?

I used docker and ec2 in this pipeline to learn about them. One can simply run this pipeline by removing docker and airflow in lambda. Using lambda will be quick and cost effictive.
Partner With Anuj
View Services

More Projects by Anuj