Cyclistic Case Study: Annual Members vs Casual Riders

Winnie Wanjiku

0

Case Study Writer

Blog Writer

Technical Writer

Google Docs

Grammarly

Medium

Cyclistic case study is part of the capstone project for the Google Data Analytics Professional Certificate course which I completed in July 2022.
Table of Contents

Quick Overview

Here is a quick overview of what you will find in this post:
Background — A brief introduction of the Cyclistic bike-share company.
Data Analysis Phases — A breakdown of the data analysis phases i.e. Ask, Prepare, Process, Analyze, Share, Act.
Tools used — MySQL Workbench, MS PowerPoint, Tableau.
We can now get started. Below is a breakdown of how I conducted the entire process.

Background

The capstone project is based on fictional data depicting a bike-share company called Cyclistic. The bike-share company is based in Chicago. The company offers different types of bikes as well as flexible pricing options.
Bike riders are categorized into two broad categories:
Casual riders — Customers who purchase single-ride or full-day passes.
Cyclistic members — Customers who purchase annual memberships.

Data Analysis Phases

For this case study, I followed the following data analysis phases:
Ask -> Prepare -> Process -> Analyze -> Share -> Act

Ask Phase

The ask phase where I defined the business task and how insights obtained would be used to drive business decisions.
Business Task
This case study aims to identify differences in how annual members and casual riders use cyclistic bikes differently.
Insights
The marketing team will use insights from this study to:
Identify ways to increase revenue by targeting casual riders.
Find out whether it is ideal to convert casual riders to annual members.
Come up with strategies to influence casual riders to become annual members.

Prepare Phase

The data used in this analysis is derived from Cyclistic’s historical trip data
Cyclistic is a fictional company therefore the data used in this analysis is fictional. This data has been made available by the Google professional data analytics certificate team.
The data is organized in folders containing CSV files of the data classified by month and year.
Assumption
Since the data is fictional, we will assume that it is reliable, original, comprehensive, current, non-biased and was collected ethically.

Process Phase

The following tools were used:
MySQL workbench to analyze the data
Tableau to create visualizations
PowerPoint to create a presentation
To start the data processing, I did the following:
Extracted the CSV files from the folders provided.
Created a database in MySQL Workbench to hold the data.
Created a table template as well as 4 tables to hold data according to seasons.
Loaded the CSV files into the tables.
Added two calculated columns to the tables i.e. ride_length and day_of_week.
Data Cleaning
The next step of processing the data is data cleaning.
I cleaned the data by checking the following parameters: Completeness, Outliers, Accuracy, Relevance, and uniqueness.
1. Completeness
In this step, I checked whether there were blank or null values.
Observation
From this, I observed there was a large number of blank station names.
Assumption
The blank station values were a result of bike rides starting or stopping at random points and not at a specific station.
Solution
Since deleting these rows would skew the analysis, I opted to fill in the blanks with the placeholders random_start_point or random_stop_point. I also updated the blank station IDs with the placeholders 1111 for random_start_point and 2222 for random_end_point.
Nb. The placeholders were only placed for values where the latitude and longitude values were present.
UPDATE winter_table SET start_station_name = 'random_start_point', start_station_id = '1111', end_station_name = 'random_end_point', end_station_id = '2222'WHERE (start_station_name = '' AND start_station_id = '' AND end_station_name = '' AND end_station_id = '') ;UPDATE winter_table SET start_station_name = 'random_start_point', start_station_id = '1111' WHERE (start_station_name = '' AND start_station_id = '') ;
2. Outliers
I looked for outliers in the ride_length column.
Observation
There were negative durations as well as rides that took very long.
Assumption
Valid ride lengths should fall between 10 seconds and 2 hours (7200 seconds).
Solution
I deleted all rows that did not fall between 10 and 7200 seconds.
DELETE FROM winter_table WHERE ride_length not between 10 and 7200 ;
3. Accuracy
This is where I verified the correctness of the data.
Checked whether dates and day_of_week values were accurate. I did this by randomly picking dates and verifying them on the calendar.
Confirmed that data fell within the correct range — between December 2020 and November 2021. I chose this range to accurately capture the seasonal trend.
Confirmed that seasonal tables contained data from the correct months.
All the values checked were accurate.
4. Relevance
Data that did not contribute to the business outcome had to be removed.
Observation
The data included station names that seemed to be test stations, warehouses or temporary setups. These stations included extensions such as “divvy”, “temp”, “test”, or “check”.
Assumption
Rides captured to and from these stations were done for test purposes and are therefore not relevant for this analysis.
Solution
All records from these stations were deleted.
DELETE FROM winter_table WHERE start_station_name like '%divvy%' or start_station_name like '%temp%' or start_station_name like '%test%' or start_station_name like '%check%' or end_station_name like '%divvy%' or end_station_name like '%temp%' or end_station_name like '%test%' or end_station_name like '%check%' ;
5. Uniqueness
I verified whether the ride id values were unique and also checked if there were station id names that represented more than one station.
Observation
All ride ids were unique. However, there were station IDs that were shared by more than one station.
Solution
I deleted the records containing non-unique station IDs.

Analyze Phase

After cleaning the data, it was ready for analysis.
The clean dataset contained a total of 5,210,101 records where:
Members comprised 55.60% of total customers.
Casual riders comprised 44.40% of total customers.
In this step, I was able to analyze the data to gain different insights such as:
Average ride length
Maximum, minimum, and mode ride length
Trends in riding time based on hours, days, months, and seasons
Trends for different seasons
Trends in bike preference
All these parameters were analyzed with the aim to identify how casual riders and annual members used the bikes.

Share Phase

I used Tableau Public to create visualizations of the data.
The link can be found here on Tableau Public.
In addition, I also created a PowerPoint presentation which can be found here.

Act Phase

There are a few insights I obtained from analyzing the Cyclistic bike share data. I’ve summarized them in the table below.
These are the insights gathered from the existing data. However, to expand my findings, the following data could have been valuable:
Member ID’s to identify the number of unique customers.
Prices of existing payment plans.
Recommendations
From this information, it is clear that annual members and casual riders use Cyclistic services differently.
Therefore, instead of converting casual riders to annual members, it would be more effective to introduce additional membership options.
Top 3 recommendations
Introduce seasonal passes.
Introduce monthly or weekend passes.
Launch a digital campaign marketing these new membership options to casual riders.

Useful Links

I could not fit all information in this post. Here are links to additional information about this project.
Github — You can find the queries used here.
Tableau Public — The visualization for this data is on Tableau Public.
PowerPoint — You can find the PowerPoint presentation here.
Like this project
0

Posted Jan 5, 2025

Cyclistic case study is part of the capstone project for the Google Data Analytics Professional Certificate course which I completed in July 2022.

Likes

0

Views

0

Tags

Case Study Writer

Blog Writer

Technical Writer

Google Docs

Grammarly

Medium

12 Helpful Tips To Protect Yourself From Cyber Attacks
12 Helpful Tips To Protect Yourself From Cyber Attacks
My Experience Doing the Sprocket Ltd Case Study KPMG Internship
My Experience Doing the Sprocket Ltd Case Study KPMG Internship
Resources that helped me self-study and pass the CompTIA Networ…
Resources that helped me self-study and pass the CompTIA Networ…
How To Fix There Has Been A Critical Error On This Website
How To Fix There Has Been A Critical Error On This Website