Natasha Kabuka
pandas
. But first we'll need to create a DataFrame from scratch. Let's start by creating a Python object covered in Intermediate Python: a dictionary!movie_dict
to a pandas
DataFrame, we will first need to import the library under its usual alias. We'll also want to inspect our DataFrame to ensure it was created correctly. Let's perform these steps now.pandas
DataFrame, the most common way to work with tabular data in Python. Now back to the task at hand. We want to follow up on our friend's assertion that movie lengths have been decreasing over time. A great place to start will be a visualization of the data.matploblib.pyplot
is one of the most common packages to do so.matplotlib.pyplot
Figure object, which we have already provided in the cell below. You can continue to create your plot as you have learned in Intermediate Python."datasets/netflix_data.csv"
. Let's create another DataFrame, this time with all of the data. Given the length of our friend's data, printing the whole DataFrame is probably not a good idea, so we will inspect it by printing only the first five rows.type
. Scanning the column, it's clear there are also TV shows in the dataset! Moreover, the duration
column we planned to use seems to represent different values depending on whether the row is a movie or a show (perhaps the number of minutes versus the number of seasons)?type
is Movie
. While we're at it, we don't need information from all of the columns, so let's create a new DataFrame netflix_movies
containing only title
, country
, genre
, release_year
, and duration
.fig = plt.figure(figsize=(12,8))
to increase the size of the plot (to help you see the results), as well as to assist with testing. For more information on how to create or work with a matplotlib
figure
, refer to the documentation.duration
under 60 minutes and look at the genres. This might give us some insight into what is dragging down the average.genre
column. Much as we did in Intermediate Python, we can then pass this list to our plotting function in a later step to color all non-typical genres in a different color!matplotlib
has many named colors you can use when creating plots. For more information, you can refer to the documentation here!colors
list that we can pass to our scatter plot, which should allow us to visually inspect whether these genres might be responsible for the decline in the average duration of movies.plt.style.use()
. The latter isn't taught in Intermediate Python, but can be a fun way to add some visual flair to a basic matplotlib
plot. You can find more information on customizing the style of your plot here!