Aleesha's Netflix Usage - Exploratory Analysis

My friend and flatmate Aleesha watches a lot of Netflix. When I say a lot, I mean almost all the time when she is at home. She has it on when shes cooking, when shes on her phone in bed, and even while she sleeps. You might think this means she watches a lot of new things, but you would be wrong. From my own casual observations, it's almost always the same few shows she just watches over and over and over again. Now I'm not here to judge her (I do that plenty), but I was curious as to exactly how much she has watched, and how much of it is the same stuff. When I found out Netflix had an option to request your data, I knew I had to take a look.

Lets use the pandas library for Python to do some exploratory analysis of her Netflix Data, and if we think there is more to see we can do a deep dive with Power BI later.

The data Netflix provides is a number of folders with various .csv files. The file we are interested in is "ViewingActivity.csv". Lets import this data and have a look (along with some other libraries we will use later).

38,000 rows seems like a lot. Lets have a look at the first few to see if they are all useful to this analysis.

We can see that the dataset contains information about viewing activity for all profiles on this netflix account. We are only interested in data from Aleesha (Leesh), so lets keep those only.

More realistic, but interesting that her usage makes up 45% of all interactions on that account!

There are a few other rows that we want to drop. This netflix profile is logged into the lounge TV at the flat where 3 different people use it. Fortunately, the column 'Device Type' allows us to isolate the TV in question ('Samsung 2016 Hawk-M DTV Smart TV'). That being said, we dont want to simply drop all rows that use that TV, as there could still be valuable data there.

There is hope however, as I know from her usual schedule that she almost never watches from that TV later than 8pm NZST, which is when the vast majority of viewing by the other people happens. The 'Start Time' Column should help us here, but its not currently in a datetime format, and uses the UTC timezone, not our local one. Lets convert this column to a workable format.

Now that we have the time zone correct, we can remove all entries that begin later than 8pm on that television

While we are tidying columns, lets also convert the duration column to a more workable format (timedelta), and drop the columns we dont need at this point to make the table a bit easier to read.

Now that we have the columns we care about, lets filter out entries that are too short to be valid. Entries that are under a minute are likely just trailers, previews or autoplays on the home page. Lets drop all entries under 1 minute by using the 'Duration' column.

From my observation in person, there are around 8 shows that she watches a considerable amount, with 3 main shows that she spends most of her time watching: Brooklyn Nine-Nine, The Big Bang Theory, and The Vampire Diaries. For the purposes of analyzing this data, we are going to add two new columns to the dataset, 'Show Name' and 'Big 3'.

'Show Name' will be a string that lists the title of the show for each episode watched. Only a handful of shows will have their name, the ones that I can remember seeing her watch often, the rest we will just label 'Other'.

'Big 3' will be a boolean, where shows that are identified as being part of the big 3 being assigned 'True', with the rest 'False'.

We can see from these results that The Big Bang Theory and Brooklyn Nine-Nine dominate the results, and really are the 'Big 2'. 'Other' remains the most numerous category, which is a little surprising to me but makes sense. Now lets compare the big 3 with all other shows

That being said, as the Big 3, they make up just under half of all viewing instances on that account. However, this is not necessarily the same as viewing hours, as there may be instances where she watched only briefly.

There is enough here to justify a deeper analysis. We will now take this data to Power BI and produce a full report. Thank you for following along with me!