
ghg-bucket), Dataproc cluster (ghg-dataproc), and BigQuery datasets (Staging and Analytics) are provisioned using Terraform.Staging dataset.Analytics dataset.Analytics dataset to enable interactive reporting and exploration of emissions trends, temperature changes, and economic correlations.gcp_kv.yml):
Sets up GCP environment variables, service credentials, and resource references (GCS bucket, Dataproc cluster, BigQuery datasets).gcp_upload.yml):
Uploads raw emissions data into the GCS bucket.gcp_spark_bq.yml):
Submits a PySpark job to Dataproc to transform raw data and load the output into BigQuery Staging.stg_emissions.sql: Selects relevant fields, calculates emissions per capita.stg_country.sql: Extracts the latest GDP and population per country.fact_emissions.sql: Creates a partitioned and clustered emissions fact table.dim_country_info.sql: Builds a dimension table with country-level info.annual_emissions.sql: Aggregates total and per capita emissions by year.global_temp.sql: Analyzes temperature trends linked to emissions.emissions_vs_econ.sql: Compares GDP to CO₂ per capita emissions.Posted Apr 28, 2025
Developed a data pipeline for GHG emissions analysis using various tools.
0
11