Exploratory data analysis of California Traffic Collision database from SWITRS. The analysis was done to discover insights and trends in the data.
Some insights from the analysis
The database contained records of collisions from 1st of january, 2001 to 3rd of june, 2021
The year 2002 had the highest collisions recorded. The top 3 years with highest collsions records are 2002, 2003,2004.
Los angeles has the highest collisions with more than 2M entries while sierra has the lowest with about 1447 collisions recorded
Overall, 73007 died due to traffic collisions, while more than 5 million people sustained various forms of injury
The number one PCF (Primary Collision Factor) Violation Category is speeding
About 16.51% of the overall traffic collisions happened on Friday,followed by thursdays with 14.77% while sunday has the lowest collision and accounts for about 11.7% of overall traffic collisions
With respect to the parties directly involved in collisions, males make up about 61.27% of the total people involved while females and non binary make up 38.7% and 0.02% respectively
Whites make up the larger proportion of parties involved in traffic collisions
About 52448 collisions involved school buses, and this accounts for about 0.56% of total collisions.
There are more victims within the age groups of 10-20 and 20-30, than any other age groups
Check the project's full details as well as the codes used on the github page :