Unveiling cinematic insights: analyzing my Letterboxd data
Introduction
Letterboxd is a fantastic platform for film enthusiasts to track and share their movie-watching experiences.
Apart from being a hub for movie reviews and social interaction, it also provides a valuable dataset that can be harnessed to gain insightful statistics about your film preferences and habits. In this blog post, I will guide you through the process of extracting and analyzing your Letterboxd data, enabling you to uncover intriguing patterns and trends hidden within your cinematic journey.
Step 1: Exporting Your Letterboxd Data
To get started, you need to export your Letterboxd data. Follow these steps:
Log in to your Letterboxd account.
Click on your profile picture and select “Settings.”
Scroll down to the “Data Export” section.
Click on the “Request Data Export” button.
Wait for an email notification confirming that your data is ready for download.
Once you receive the email, download the data export file. It will likely be in a compressed format, such as ZIP. Unzip the file to access the data in a CSV format.
To get started, we load our data from two CSV files: watched.csv containing details about the movies we’ve watched and ratings.csv containing our ratings for those movies. We then merge these datasets on common columns: 'Date', 'Name', 'Year', and 'Letterboxd URI'.
Step 2: Enriching Genre, Director, and Runtime Information
We complement our dataset by fetching genre, director, and runtime information from the TMDb API (I relied on the tmdbsimple wrapper) using the fetch_movie_details function.
Replace YOUR_API_KEY with your actual TMDb API key, which you can obtain by signing up for a free TMDb account and generating an API key.
Step 3: Key Analyses
Let’s now perform an Exploratory Data Analysis (EDA) to gain insights on your watching habits.
During my journey, I discover that I logged 391 watched movies, of which I rated 118 and my average rating is 3.40.
Month seasonality
Let’s visualize the distribution of movies watched over time.
It came as no surprise that during covid lockdowns in 2020 and 2021 I watched a looooot of movies. This July 2023 is different from the others as I managed to go everyday to the Dolce Vita sur Seine open air cinema festival and was able to watch a lot of stuff. Moreover, the weather in Paris this summer is awful, so late sunsets at Paris Plage are replaced with the vision of movies. Last but not least, it is Barbie month!
Day of week seasonality
Let’s visualize the seasonality of movies watched over time.
It looks like my preferred day of week to watch movies is… Monday?! 40% of the movies I watched in these years have been watched on Sundays and Mondays. Nice. I did not expect this.
Preferred Genres
Let’s now look at the the distribution of my watched genres.
Favourite genre is drama for a drama queen.
Movie Ratings
Let’s visualize the distribution of my ratings.
I never gave a 0.5, such a gentle spectator I am!
Average Rating and Film Count by Release Year
I’ll now calculate the number of films watched for each release year, and then I’ll plot both my average rating and the film count.
I should definitely fill the gap of movies released in the 70s.
Conclusion
This analysis offered a unique perspective into my movie-watching preferences, highlighting preferred genres, and the presence of a particular day dominance in my movie-watching routine. By leveraging data-driven insights, I can make more informed decisions when selecting movies and continue to enrich my cinematic experience. Too bad that no dataset can (yet) tell me how many times I fell asleep before being able to finish the movies I logged!