NYC Taxi & Limousine Commission Data Analytics

Devyansh Rajput

Data Visualizer
Data Analyst
Product Data Analyst
Google BigQuery
Looker
Python
Total Revenue: The total revenue generated by the taxi trips in the dataset is $1,639,072.09.
Average Trip Distance: The average distance traveled by the taxi trips in the dataset is 3.03 miles.
Average Fare Amount: The average fare amount charged by the taxi trips in the dataset is $13.25.
Average Tip Amount: The average tip amount given by the passengers to the taxi drivers in the dataset is $1.87.
Payment Type: The most common payment type used by the passengers in the dataset is Credit card (34.9%), followed by Dispute (25.2%), No charge (20.5%), and Cash (19.4%).
Rate Code: The most common rate code applied to the taxi trips in the dataset is Standard rate (75.5%), followed by JFK (11.8%), Negotiated fare (6.7%), Newark (3.8%), and Nassau or Westchester (2.2%).

Problem Statement

The New York City Taxi and Limousine Commission (TLC), created in 1971, is the agency responsible for licensing and regulating New York City's Medallion (Yellow) taxi cabs, for-hire vehicles (community-based liveries, black cars, and luxury limousines), commuter vans, and paratransit vehicles.
The problem is to analyze the data from the TLC website in terms of revenue, distance, fare, tip, rate codes, and payment types.
Creating a simple data visualization tool that can be accessed daily so one can make data-driven decisions for the company.

Data Extraction Using Python

Extracting data from NYC website as parquest files.
Converting the parquest files to pandas dataframe.
Checked garbage values or outliers in the data frame in every column and marked them.
Checked for blank or null values in every column and marked them as missing.

Data Cleaning and Trsanformation(ETL) using Python and Mage AI

Performed data cleaning and transformation to prepare the data for analysis.
Removed all records with blank or null values in all columns.
Created custom columns by combining columns and slicing information.
Performing the ETL process using the mage ai on the Google Compute Engine and loading data to BigQuery database.

Data Analysis Using SQL (BigQuery)

Performed analysis to extract insights from the transaction and date tables.
Joined the columns using joins and applied filters by rate codes, payment types, and fare amounts to select the relevant data.
Calculated the total fare amount, number of rides, and other metrics of interest using aggregation functions.

Data Visualization using Looker Studio

Performed data visualization using Looker Studio to create interactive and insightful reports and dashboards.
Created measures to define custom calculations and metrics for the data.
Created bar, pie, and map charts to visualize the data and compare categories and trends.
Create vendor id, payment type, rate code, Trip distance, passenger count, & total amount slicers and dropdowns to filter the data and enable dynamic analysis.
Partner With Devyansh
View Services

More Projects by Devyansh