Capstone Project - Reelatable

Capstone Project

Discover Yourself Through Your Movies

May 2024

Team

Ankita Suresh Shanbhag

Saurabh Chachra

Hrishikesh Srinivas Nagaraju

Kinshuk Nigam

Faculty Advisor

Dr.

Marti Hearst

Master of Information Management & Systems

School of Information

University of California, Berkeley

Table of Contents

Purpose................................................................................................................................................................ 4

Self-Understanding................................................................................................................................................. 4

Lack of Systematic Methods to Improve Self-Understanding................................................................................. 4

Goal......................................................................................................................................................................5

Theoretical Framework......................................................................................................................................... 5

How Self-Understanding Works..............................................................................................................................5

Categorization.........................................................................................................................................................6

Significant Psychological Dimensions.................................................................................................................... 6

Relationship between Movies and the Psychology of Movie Enthusiasts.............................................................. 7

Patterns (Structured Interviews)............................................................................................................................ 8

Literature Review in Storytelling.............................................................................................................................8

Cheaper, Efficient, and Scalable Way to Uncover Patterns..................................................................................... 8

Database of Psychological Characteristics in the Movies....................................................................................... 9

User Experience Research................................................................................................................................... 10

Understanding the User........................................................................................................................................11

Usability Testing - Phase 1.................................................................................................................................... 11

Efficacy of the Methodology.................................................................................................................................12

User Experience Design.......................................................................................................................................12

Service Blueprint...................................................................................................................................................12

Design Iterations...................................................................................................................................................12

Market Analysis.................................................................................................................................................. 15

Target User Group.................................................................................................................................................15

Market.................................................................................................................................................................. 16

Competitive Landscape........................................................................................................................................ 16

Product Considerations.......................................................................................................................................17

Forces of Progress.................................................................................................................................................17

Jobs to be done (JTBD)......................................................................................................................................... 18

Product Features.................................................................................................................................................. 18

Product Roadmap................................................................................................................................................. 19

Engineering.........................................................................................................................................................19

Machine Learning Service.....................................................................................................................................21

Backend service.................................................................................................................................................... 23

Frontend service................................................................................................................................................... 24

Deployment.......................................................................................................................................................... 26

Limitations & Future Work.................................................................................................................. 28

Conclusion.......................................................................................................................................... 28

Contributions......................................................................................................................................29

Ankita Shanbhag...................................................................................................................................................29

Saurabh Chachra...................................................................................................................................................30

Hrishikesh Srinivas Nagaraju.................................................................................................................................30

Kinshuk Nigam...................................................................................................................................................... 31

References.......................................................................................................................................... 32

Appendix - Application Flow............................................................................................................................... 36

Appendix - Code................................................................................................................................................. 43

Appendix - Problem Statement, Vision and Value Proposition.............................................................................43

Appendix - Product Management........................................................................................................................43

Product Roadmap................................................................................................................................................. 43

Appendix - Engineering....................................................................................................................................... 45

MVP...................................................................................................................................................................... 45

Appendix - Economic and Business Analysis........................................................................................................51

Economies of scale............................................................................................................................................... 51

Supply side economies......................................................................................................................................... 52

Network effects.................................................................................................................................................... 52

Switching costs..................................................................................................................................................... 52

Potential Revenue Sources................................................................................................................................... 53

Pricing Strategy Evaluation................................................................................................................................... 53

Switching Costs and Lock-in Strategies................................................................................................................. 54

Final Pricing Strategy............................................................................................................................................ 55

Value Network...................................................................................................................................................... 56

Impact on the Value Network...............................................................................................................................58

Regulations........................................................................................................................................................... 59

Neutrality..............................................................................................................................................................60

Purpose

Self-Understanding

The Greek motto gnōthi sauton (know thyself, nosce te ipsum)

Socrates believed that all philosophical commandments could be reduced to one idea: ‘Know thyself.’

There is an extensive body of scholarly research underscoring the significance of self-understanding for

psychological well-being and healthy functioning of individuals. Self-concept clarity is positively

associated with self-esteem (Campbell, 1990). The findings from Lewandowski and Nardone (2012)

suggest that higher self-concept clarity individuals may be at an advantage in developing relationships.

Self-concept clarity may be beneficial in a variety of relationship situations and contexts (see Gurung et

al., 2001). Self-awareness also contributes to better decision making and team performance (Dierdorff &

Rubin 2015).

Lack of Systematic Methods to Improve Self-Understanding

However, in the quest for deeper self-understanding, there are significant challenges due to the lack of

systematic methods available. Currently, the most accessible option is to engage with online

psychological assessments, such as the Big 5 personality traits. These tools are advantageous due to their

quickness, ease of access, affordability, and scalability. However, the complexity of human psychology,

which encompasses thousands of psychological dimensions, adds to the difficulty, as it's not

straightforward to identify which dimensions are most consequential for an individual. Hence,

psychological tests fall short in their comprehensiveness. Consequently, while these online tools offer a

starting point, they do not provide a thorough pathway to deeper self-understanding.

On the other hand, therapy offers a more comprehensive approach but is hindered by its high costs and

lack of accessibility, making it an impractical option for the majority of the world's population. This

dichotomy between the accessibility of online tests and the thoroughness of professional therapy

presents a significant barrier in the field of psychological self-assessment.

Goal

Design and build a product that will help a large number of users understand themselves better. The

product should be designed to be rapidly scalable with variable cost close to zero.

Theoretical Framework

How Self-Understanding Works

Self-understanding involves categorizing one's identity through various descriptors encapsulated by the

question, "Who am I?" This process helps individuals develop a greater understanding of self-concept by

examining personality traits, social roles, and existential affiliations (Schwartz et al. 2017). Therefore,

enhancing self-understanding could significantly benefit from methodologies that support and refine the

categorization of self into various descriptors under the "Who am I?" inquiry.

However, the vast variety of possible self-descriptors (e.g., idealist, optimist, compassionate, feminine,

inquisitive), introduces two important questions:

1. Is it necessary for individuals to understand the scientific definitions of each descriptor to

effectively categorize themselves? This approach seems neither efficient nor desirable.

2. How can we effectively narrow down these descriptors to the psychological dimensions most

significant to different individuals? Identifying and focusing on key psychological traits that

resonate personally can streamline the self-understanding process, making it more accessible

and tailored to individual needs.

Categorization

Prototype Theory of Categorization

The bird category, from Aitchison (2012: 69)

According to Rosch (1978), people rely less on abstract definitions of categories than on a comparison of

the given object or experience with what they deem to be the object or experience best representing a

category ("prototype"). Hence, it could be argued that the process of self-categorization can be better

supported by providing prototypical examples of a category rather than its definition.

How can we provide prototypical members of the trait category that represent

them?

Mar and Oatley (2008) suggested that “The function of fiction is the abstraction and simulation of social

experience”. Black and Barns (2015) found that film narratives, as well as written narratives, may

facilitate the understanding of others’ minds.Further, even before story writers start writing a story, they

etch out the psychological characteristics of a character in detail (McKee, 2005). Therefore, fictional

characters might serve as excellent prototypical examples of various psychological characteristics.

Significant Psychological Dimensions

To answer the second question introduced earlier (“How can we effectively narrow down these

descriptors to the psychological dimensions most significant to different individuals?”), we conducted

three kinds of studies

1. Semi-structured interviews with movie enthusiasts

2. Structured interviews with movie enthusiasts

3. Literature review in storytelling

Relationship between Movies and the Psychology of Movie Enthusiasts

The purpose of this study was to understand the relationship between movies and the psychology of

movie enthusiasts. We conducted semi-structured interviews with 8 movie enthusiastsWe asked the

participants a variety of questions like, “How have movies helped you get through tough times or made

sense of things happening in your life?”, “Can you recall a moment in a movie where you felt a personal

connection or that it resonated with your own life experiences?”.

Qualitative Data Coding Using MAXQDA: Coded the transcripts of the interviews to reveal major themes

and patterns.

However, one question that elicited the most interesting responses was “Can you tell me about a movie

that made an impact on you? It doesn't have to be a masterpiece, just any film that resonated with you

personally.” In their responses to this question, each of the participants invariably ended up talking

extensively about themselves: their childhood experiences, personality dispositions, family histories and

how the protagonists of these movies represent something deeply personal about them. We concluded

that this question can help us narrow down to the psychological dimensions that are most significant to

different individuals.

However, this raised another question: Would we find the same characteristic in other movies that made

an impact on the participants? In other words, Would a list of movies that made a deep impact would

show a pattern?

Patterns (Structured Interviews)

The primary research question here was whether a discernible pattern could be identified from a list of

movies that have made a deep impact on an individual. To investigate this, we recruited 9 movie

enthusiasts. To each participant, we first asked to list such films. Following this, participants were asked

to reflect on each movie listed, articulating why they believed these films had left a significant impact on

them. We consistently found psychological patterns in participants' movies.

For one participant, our analysis revealed that the theme of 'lost friendship' prominently figured in six

out of their top eleven films. Upon further inquiry into why this theme recurrently surfaced in their

favorite movies, the participant spontaneously articulated a personal narrative, revealing that the

challenge of forming and maintaining friendships has been a significant struggle throughout their life.

This method was applied consistently across all nine participants, allowing us to rapidly unearth

profound insights.

Literature Review in Storytelling

There can be a large variety of psychological characteristics in a movie character like personality traits,

quirks, values and beliefs, Inner conflicts, etc. The purpose of this study was to investigate what

categories of psychological characteristics make characters and stories most relatable? Conducted a

literature review of Story by Robert McKee, Save The Cat by Blake Snyder, and articles by StudioBinder

blog.

The findings highlighted four key categories of characteristics that are determined by writers before

writing a story, making their stories more engaging and relatable:

1. beliefs that guide characters’ choices throughout the narrative;

2. emotional needs or desires that drive their actions;

3. character flaws or weaknesses that hinder their ability to fulfill their needs/desires; and

4. character strengths that enable them to overcome their flaws and fulfill their desires.

These insights were instrumental in shaping the design of various prompt elements for GPT, enhancing

its ability to generate relatable and compelling content.

Cheaper, Efficient, and Scalable Way to Uncover Patterns

To make this process cheaper, efficient, and scalable we decided to work on three areas:

1. Automate the process of soliciting a list of movies. This could be achieved by developing an app.

2. Instead of asking users for the reason why the movie impacted, we could build a database of

psychological characteristics in the movies. Allowing users to pick characteristics that resonated

with them, instead of asking them to reflect, would reduce cognitive load on the user.

3. Finally, automate the process of assessing the patterns

Database of Psychological Characteristics in the Movies

To build a database of psychological characteristics in movies, we considered two main options. The first

involved crowdsourcing: an engaging activity for movie enthusiasts in which users share their best guess

of character traits of their favorite characters, and then they see what their friends believe and how

much users agree with each other, fostering a community-driven data collection. This method, however,

presents several challenges such as ensuring data quality and consistency due to varied interpretations

among contributors, maintaining participant engagement, scaling the database management as

contributions grow, addressing potential biases and representation issues, upholding strict privacy and

ethical standards, and guaranteeing data verification to avoid fraud.

The second option pivoted towards leveraging Large Language Models like GPT. We noticed that GPT was

excellent at character analysis. We created a database that includes 500 distinct psychological

characteristics and validated the database’s accuracy with movie enthusiasts, who confirmed the depth

and relevance of the analyses produced by GPT. This technological approach streamlined the dataset

building process. You may view the faceted navigation interface of this database.

Ratatouille Character Analysis by ChatGPT-4

User Experience Research

We conducted a multi-part study with 5 movie enthusiasts to better understand our target user group

and test the efficacy and usability of our system.

Understanding the User

Post-Movie Engagement Behaviors

Since this product involves engaging with content related to a movie after a user has watched the movie,

we wanted to understand people’s post-movie engagement behaviors. We interviewed five movie

enthusiasts and we learned about the diverse range of activities that these users partake in. One

participant engages deeply by collecting movie merchandise, seeking related books, and viewing movie

edits on YouTube. Another enjoys listening to soundtracks and reading reviews from fellow movie-goers

on Letterboxd. A third delves into discussions about movie meanings on forums like Reddit and Quora.

These insights demonstrate the diverse and rich ways enthusiasts interact with films beyond just

watching them.

Proactive Effort Towards Self-Understanding

We also discussed their proactive efforts toward self-understanding. We were surprised to learn that

many of our participants had been putting in significant active effort to understand themselves. One

participant had been exploring fundamental aspects of their identity, like gender and ethnicity, and used

daily conversations with family and inspirational content on Pinterest for reflection. Another participant,

struggling to focus and professional conflicts, was utilizing online tests, Google searches, and

professional consultations. A third had been engaging in written and verbal self-reflection, discussing

their own behavior with friends to deepen their understanding. Through these interviews we surfaced

some of the ways in which people are actively trying to understand themselves..

Usability Testing - Phase 1

Choosing the list of most impactful movies

We wanted to understand how users are choosing their list of movies. One participant chose movies that

left them wanting more of similar stories. Another chose films based on how memorable the storylines

were and potential for rewatching. Another participant used the heuristic of how much they refer to that

movie.

Engagement

Understanding oneself is a complex process that demands continuous motivation. We gauged participant

engagement in the content by presenting them with a 14-page document detailing protagonists'

characteristics from their favorite movies. Although participants were told they could stop reading at any

time, they spent an average of six and a half minutes thoroughly engaged, evidenced by frequent

laughter and verbal reactions.

Efficacy of the Methodology

Do these characteristics reflect participants' own characteristics?

We examined participants' preferences for characters by presenting them with two lists of psychological

patterns—one derived from their own favorite films and another from a different participant's favorites.

We asked participants to choose between two hypothetical movies, each featuring a protagonist

embodying traits from one of the lists. Intriguingly, all participants consistently chose the movie with

traits from their own list, indicating a preference for familiar psychological patterns.

We further explored participants' emotional and cognitive responses to these traits. Without informing

them of the traits' origins, we asked them to reflect on each characteristic individually and share their

thoughts and feelings. The participants in general showed great enthusiasm in claiming most of the

patterns described something about themselves. Surprisingly, their enthusiasm was higher in claiming

the patterns in flaws than other characteristics that were generally positive. One participant notably was

surprised by the protagonist centrality in their movie choices

User Experience Design

Service Blueprint

Based on the insights that we gained from our research, we designed the following Service Blueprint.

This provided the engineer with the expected interactions between the user and the app.

Design Iterations

Since our app involves presenting a variety of information that updates with multiple sequentially inputs

by the user, we decided to design our app in the style of a dashboard.

Low-fidelity Design

The following prototype complements the Service Blueprint.

Low-fidelity Figma Prototype (Link)

Through the usability study for the low fidelity prototype, we learned two major insights

1. Pattern did not need to have movie posters.

2. “Resonated” Container needed to be next to Characteristics for easy connection (Gestalt).

First Iteration of the High-Fidelity design(Link)

From the usability of the first iteration of the high fidelity prototype, we gained the following insights

● There was too much information in a single view and the participants experienced an

information overload.

● Users did not perceive the movie list as a list.

In the subsequent interactions we changed the layout into a Tabbed Navigation, moved the movie list

under the search bar.

Version 4 of the Design Prototype (Link)

Market Analysis

Target User Group

Our ideal customer persona is people who want to understand themselves better.

We hypothesized, and validated through user research, that movies can be a fun and interesting way to

learn more about ourselves. By using movies as a tool, we can make the process of self-understanding

more engaging and enjoyable.

So, our ideal users are those who are interested in introspection and also love watching movies. They'll

be able to explore their thoughts and feelings through the stories and characters they see on screen,

making the journey of self-discovery both enlightening and entertaining.

Market

The Total Addressable Market is the entire global market of individuals who are interested in

self-understanding and personal growth. This includes anyone who might find value in introspection

through various media, particularly those who enjoy movies.

The global self-improvement market is estimated at $39.2 billion. Since the global streaming market has

a penetration of 18%, we can assume that our TAM is 18% of $39.2 billion or ~$7 billion.

Note that this is a conservative estimate as it is likely many users in the self-improvement market are

also video streaming consumers (as both correlate with wealth and income), and the penetration of

video streaming in this market may be higher, resulting in a higher TAM.

Competitive Landscape

While there are various competitors in the mental well-being and self-understanding space, and others

that provide an engaging platform to increase user engagement, very few (eg. Headspace and Calm)

operate in both spaces.

Self-Understanding Only

Competitor

Goal

Enables

Self-understanding

User Effort

Explainable

Engaging

Talkspace

Online therapy and

counseling services

✓

High

Reflectly

Mood journaling app for

self-awareness

✓

High

✓

Truity

Personality assessment

and insights platform

✓

Low

Entertainment Only

Competitor

Goal

Enables

Self-understanding

User Effort

Explainable

Engaging

Netflix

Streaming platform for

movies and TV shows

Low

✓

Letterboxd

Social platform for movie

enthusiasts

Low

✓

Both

Competitor

Goal

Enables

Self-understanding

User Effort

Explainable

Engaging

Headspace

Meditation and

mindfulness app

Low

✓

Calm

Relaxation and

meditation app

Low

✓

While tools like Headspace and Calm, as meditation and relaxation apps respectively, offer low user

effort and are engaging, they are not directly focused on self-understanding.

Overall, there's a clear need for a solution that effectively aids in self-understanding, is engaging,

requires low user effort, and is explainable—a balance not fully achieved by any single competitor in

the current landscape.

Product Considerations

Forces of Progress

Push Forces (Dissatisfaction with the Current State):

● Lack of Self-Knowledge: The user might feel a general sense of not understanding themselves

well. They might have questions about their motivations, values, or desires.

● Difficulty with Introspection: They might struggle to analyze their own thoughts and feelings on

their own.

● Unsatisfying Self-Discovery Methods: Traditional methods of self-exploration (e.g., journaling,

personality tests) might feel boring or ineffective.

Pull Forces (Desire for Improvement):

● Increased Self-Awareness: The user desires a deeper understanding of their inner world.

● Personal Growth: They want to learn and grow as a person.

● Improved Decision-Making: They hope understanding themselves better will lead to better life

choices.

● Greater Well-Being: They believe self-knowledge can contribute to a happier and more fulfilling

life.

● Engaging Self-Discovery: They enjoy learning through stories and visual media, making movies

an attractive tool for self-exploration.

Habit:

● Comfort with the Status Quo: The user might be comfortable with their current level of

self-understanding, even if it's not ideal. They might be hesitant to invest time or effort in a new

approach.

Anxiety:

● Fear of the Unknown: Delving into self-discovery can be confronting. The user might be anxious

about what they might learn about themselves.

● Analysis Paralysis: They might worry about "overthinking" things and getting stuck in analysis

instead of taking action.

Jobs to be done (JTBD)

Based on the forces of progress we identified for users interested in self-understanding, here are the

potential "jobs to be done" they might be trying to accomplish:

JTBD1: Uncover deeper truths about themselves in a fun and engaging way.

JTBD2: Make self-understanding more explainable.

JTBD3: Feel confident and supported throughout their journey of self-understanding.

Core Job:

When I find traditional methods of self-discovery boring, I want to use an engaging way to gain insights

into myself so that I can understand myself better.

Product Features

1. Objective/Goal:

1. We are building a web app with the primary goal of helping users improve their

self-understanding. The product will engage users in meaningful interactions around

their favorite movies while improving self-understanding as a by-product.

2. Features:

1. Users can read psychological analysis of their favorite movie characters which would be

novel and interesting information for the user. Often users only capture psychological

characteristics subconsciously while watching the movie. This leads to an emotional

impact that is strong but not often understood. The system makes it conscious,

explaining why the movie might have resonated with them.

2. Users can find out psychological patterns in their favorite movies. This helps them

understand themselves and makes them feel understood.

3. Users can find movies that are similar to a movie that they really liked. Currently, there is

no easy way to find a movie based on the type of characters.

4. System explains exactly why a movie is being recommended.

Product Roadmap

The product roadmap outlines the development phases for a movie recommendation app designed to

help users understand themselves better. The roadmap is divided into four phases: Research, Product

Management, Design, and Development. During the Research phase, the team conducted user research

to understand user pain points and test different prototypes. Specific tasks include prompt engineering,

user group identification, usability testing, and market analysis. The Product Management phase

includes market analysis, user segmentation and JTBDs (Jobs To Be Done) analysis. During the Design

phase, the team created wireframes, low-fidelity prototypes, and high-fidelity prototypes.

The Development phase was not scheduled until after the Design phase. During this phase, the team

developed the backend, frontend, and machine learning model for the app. The process started with

building the data infrastructure. Once the database of movies was ready, the backend and frontend

development started parallelly. Lastly the entire team tested the entire app together.

Engineering

Reelatable is powered by a Machine Learning service that enables hybrid search using

Retrieval-Augmented Generation (RAG). This ML service creates vector embeddings of movies, their

plotlines, and certain characteristics traits of the protagonist identified through user research, and

upserts them to a vector database.

The backend service created using Flask then runs queries against this vector database and augments

them with some local processing and querying to power endpoints that are then consumed by the

frontend service.

The frontend service is created using Flutter to enable multi-platform development. While this project

has been designed primarily for the web, a key strength of our choice of technology stack is that we have

also been able to build an Android app.

Additionally, many technology choices in this project are motivated by the following considerations -

● Keeping it modular, so we can easily swap between tools and technologies to respond to user

research in an agile manner

● Keeping it lightweight, from the perspectives of computation, cost and effort, since we are

bound by a tight timeline and a stringent budget

System Diagram

Machine Learning Service

Code: Colab

The Machine Learning service is authored in Python, using Google Colab. This service creates vector

embeddings to upsert to a Pinecone vector index. The movies (titles and some metadata) are collected

using The Movie Database API, and are processed into dense and sparse embeddings for each movie.

Dense Embeddings

● Source: Plotlines for each movie from Wikipedia:Database download, using WikiPlots Extractor

library

● Generation:

○ LM Input: Movie plotline

○ LM: Model with text embedding support - GTE-Base-EN v1.5

○ Process: The LM encodes the semantic content of the movie's metadata into vector

space

○ Output: Dense embeddings representing the entire plotline of the movie

Sparse Embeddings

● Source: Characteristics of the protagonist as generated by GPT 3.5 Turbo

● Generation:

○ LM Mode: Q&A mode.

○ Process: The LM extracts a set of tags related to psychological traits like flaws, strengths

(personality traits), desires and flaws. These attributes are then passed through the

GTE-Base-EN v1.5, and reduced in dimensionality using PCA (with a tunable parameter

for the number of dimensions set to 12)

○ Output: Sparse embeddings representing the extracted tags in a lower-dimensional

space

Storage

● Index: Pinecone vector index using dot product for vector similarity metric (to enable hybrid

search)

● Data: Dense and sparse embeddings

● Metadata:

○ Movie name

○ Additional relevant metadata (release year, ratings, image url etc.)

Another artifact that is passed from the ML service to the backend is a pickled PCA model that is fit to

the traits of all the movies.

Additional Considerations

Data Selection

Popular Movies from TMDB were selected. We specifically picked popular movies from there, meaning

movies that are widely known and watched. Then, we used WikiExtractor to get the plots for these

movies, and after sanitizing and deduplicating the inputs, we reduced the list to around 2500 movies.

● Depth of Information: TMDB has a variety of metadata, and although we are not using most

fields today, this gives us the option to increase functionality and react to user feedback faster.

● Cost and Toil: TMDB API is free to use and well-supported, so it was preferred to other

alternatives. Even though Wikipedia is not as easy to work with because of the lack of a

dedicated API for this, existing community contributions were easy to piggyback on.

● Data Reliability: Wikipedia’s community contribution model makes it a more trusted source for

movie plotlines than most alternatives.

Model for Generating Attributes/Traits

Due to budget constraints, we initially adopted OpenAI's GPT-3.5 for generating a characteristics

database for the movies.

● Model Upgrade Considerations: Transitioning to GPT-4 was considered for its potential to

improve database quality and recommendation accuracy but was not implemented to maintain

financial viability.

● Computation Limitations: The project was constrained by the computation capacity of the free

version of Google Colab. Upgrading to Colab Pro to facilitate model fine-tuning was deemed

cost-prohibitive, so we made the decision to avoid it.

● Potential for Fine-Tuning: Developing a fine-tuned model could significantly enhance

recommendation precision. However, the required effort, resources, and experimentation to

achieve optimal results would substantially increase project costs and complexity.

Sparse Embedding Generation Strategy

● Embedding Strategy: The decision to bypass a simple one-hot encoding approach, which

couldn’t capture semantic relationships, led us to adopt a method that maintains the semantic

context of traits. We chose to use a language model capable of generating meaningful

embeddings.

● Language Model Selection: We selected a compact model (under 1GB memory usage) that was

fine-tuned for English and compatible with the SentenceTransformers library. Based on the

MTEB table from Hugging Face, which is considered a reliable benchmark for language models

for text embeddings, the GTE-Base-EN v1.5 by Alibaba NLP was chosen for its high rating.

● Dimensionality Reduction: Principal Component Analysis (PCA) was utilized to reduce the

dimensionality of the embeddings to 12. This decision was based on preliminary assessments of

the clustering quality of sparse embeddings from a sample set of movies. Although more

detailed methods like analyzing cumulative explained variance could have been used, they were

deemed beyond the scope of this project. The chosen dimensionality ensured a stable and

meaningful clustering without overly complicating the model or the process.

Backend service

Code: Github

The backend is created in Python using the Flask framework. The API has five major endpoints -

Link to Swagger

Get_all_movies

Endpoint: /all_movies/get_all_movies

Method: GET

Retrieves a list of all movies in the Pinecone database. This endpoint is computationally intensive, so

results are cached after the first call to improve performance.

Get_movie_patterns

Endpoint: /patterns/get_movie_patterns

Method: POST

Identifies clustered patterns and representative traits of the user's selection of traits from the selected

movie list.

Get_movie_recommendations

Endpoint: /recommendations/get_movie_recommendations

Method: POST

Provides movie recommendations using a hybrid RAG search to find proximal movies in the vector

database, based on a weighted ratio of dense and sparse embeddings (alpha), with a default of 0.5. The

weight ratio is dynamically adjusted based on the performance and relevance of the recommendations.

This ratio is tweaked based on user research, and for the scope of this project, we are using a ratio of 0.5

to use both dense and sparse embeddings in the search.

Search_by_traits

Endpoint: /recommendations/search_by_traits

Method: POST

Allows users to search for movies based on specific traits.

Get_movie_metadata

Endpoint: /metadata/get_movie_metadata

Method: GET

Retrieves detailed metadata for a specific movie based on its title.

Retrieval-Augmented Generation (RAG) Hybrid Search

● The Pinecone database retrieves the closest movie vectors based on the query vector, which is a

combination of sparse and dense embeddings. Sparse embeddings are generated on the

backend server using the same model as the one in the ML service, and the pickled pre-fit PCA

model is used for dimensionality reduction.

Additional Considerations

Clustering strategy

To retrieve representative traits, we are creating K-means clusters and identifying the trait that is closest

to the centroid of the largest cluster.

● Number of clusters: We are using silhouette score to identify the optimal number of clusters. To

prevent too many clusters from being formed, we are setting a minimum based on the number

of traits selected.

● Representative trait selection: We are considering the largest cluster as the best indicator of the

user’s selected traits. If multiple clusters are tied for size, the cluster that is closest to the

centroid of the global population is considered. Within this cluster, the trait that is closest to the

centroid of the cluster is deemed the most ‘representative’ trait. These choices need to be

further vetted through user research.

Frontend service

Code: Github

The frontend of the project was developed using Flutter due to its user-friendly nature, extensive online

resources, and seamless integration with Flask backend, enabling the creation of multi-platform

applications. For this project, we built both a web app and an Android app using Flutter.

Functionality Overview

● Home Tab: Upon initialization, the app fetches the latest list of all movies from the backend,

populating the search bar dropdown to facilitate easy movie selection. It displays movie

characteristics metadata fetched from the Pinecone database as users select movies. Users can

select characteristics that resonate with them, which are stored for future reference.

● Patterns Tab: This tab triggers an API call to retrieve clustered patterns and representative traits

based on the selected movies. These patterns, representing clusters of characteristics shared

among chosen movies, are displayed and updated dynamically as more movies are added.

● Recommendations Tab: Offers personalized movie recommendations based on the selected

movies and traits. It displays movie posters, and users can click on a poster to read an overview

of the movie, aiding in their decision-making process.

Output to User

● Movie Characteristics: Displays a list of characteristics for movies chosen by the user.

● Patterns: Shows a list of movie characteristics that are similar between movies chosen by the

user.

● Recommendations Based on Movies: Provides a list of movies closely matching the user's

preferences based on the movies that resonated with them.

● Recommendations Based on Resonated Characteristics: Offers a list of movies closely matching

the user's preferences based on the characteristics that resonated with them.

This structure allows for a coherent, user-friendly interface that facilitates easy navigation and

interaction across various features and functionalities of the application.

Additional Considerations

Caching

The first get_all_movies call is very time-intensive, so we added a load animation while the user is

waiting for the response from the server. We implemented caching to accelerate recurring use.

● Local storage: all_movies is stored as a blob locally. While this may end up reducing performance

of the frontend in inexplicable ways, this reduces latency of repeated movie name retrieval

significantly.

Web-first

While Flutter is inherently responsive across platforms, some platform-specific work is often required for

the fit and finish.

● Optimized for web: The app is functionally correct and performant on Android, but the layouts

are designed for web in the interest of conserving time and effort.

Downstreaming design changes

The Minimum Viable Product (MVP) implementation lagged behind the design phase, as the decision

was made not to impede the progress of the design and user research efforts while the MVP was being

developed. This, however, reduced the lead time to incorporate design updates into the front-end, and

we opted for functionality over UI quality.

Deployment

The deployment strategy for the web application involves a combination of Docker and Google

Kubernetes Engine (GKE) to ensure a robust, scalable, and manageable rollout of services. This strategy is

designed to optimize deployment processes and ensure high availability and scalability while not

exceeding the project budget.

Docker

Objective: Containerize the frontend and backend services to ensure environment consistency and

streamline deployment processes.

Action:

● Backend Service: The Flask application along with its machine learning components is packaged

into a Docker container. This encapsulation includes all necessary dependencies, ensuring that

the service operates uniformly regardless of the deployment environment.

● Frontend Service: The Flutter application is built into static files and served via a lightweight

Docker container using Nginx, optimizing delivery and performance.

Google Kubernetes Engine (GKE)

Objective: Leverage managed Kubernetes services for deploying and scaling the application with high

availability.

Action:

● Cluster Setup: Deploy the application on GKE using standard clusters configured with

high-memory CPUs to handle memory-intensive operations efficiently.

● Service Management: Kubernetes services are defined to manage network traffic to both

frontend and backend components, with load balancing to ensure even distribution of client

requests.

Android APK build

Objective: Enable one-off Android app building and installation without a dedicated release process

Action:

● Configuration of signing in Gradle: Generate a keystore and add keystore information to

build.gradle.

● Installation: Build the release apk and install to an Android Virtual Device booted up from within

Android Studio.

Additional Considerations

GKE Cluster Setup

The GKE Clusters are set up using standard deployment, and are configured to use high-memory CPUs.

● OOM Errors: Clusters deployed with Autopilot and no customization during setup often

encountered OOM (out of memory) errors and were not suitable for on-server model

operations.

● Computation: The calls to the backend are extremely slow (in some cases, an order of

magnitude slower than what’s observed with a local server). However, we still opted against

increasing computational capacity and/or deploying some of the parallelizable workloads onto

GPUs or TPUs, because they can easily exceed our budget if we are not being very observant.

The cost we incur is a serious degradation in performance and very high latency, but the flows

are still functional. Given more time, we would have liked to evaluate the performance

bottlenecks and explore strategies to optimize the API response times. This could involve

assessing the feasibility of using a GPU or exploring alternative cloud hosting solutions that offer

better performance within the budget constraints.

● Cost: Standard deployment might incur high costs if not carefully observed (especially if vertical

scaling is enabled). Therefore, we added budget thresholds and monitored costs carefully so we

are well within budget. If we were to continue with this project, we would conduct a

comprehensive cost analysis to identify areas where costs can be optimized or reduced. This may

involve reevaluating the choice of cloud provider, negotiating better pricing plans, or exploring

cost-effective alternatives for the employed models and services.

● Redundancy: With standard deployment, the number of nodes is configurable. We are using 3

nodes as that is the recommended minimum to minimize downtime while maintaining a

cost-efficient setup. This comes at the cost of scalability, however.

For the application flow and links to the code, refer to the Appendix - .

Final Report

Limitations & Future Work

We believe Reelatable has a lot of potential for improvement. Some areas of improvement we identified

are as follows -

Engineering

● Improving deployment through dedicated cloud resources to increase scalability and stability,

and improved CI/CD for faster development iterations

● Increased profiling and performance monitoring to enhance performance, quality, and

observability

● Expanding the data sources and improving the cleaning and validation flows

● Enhanced models (eg. GPT-4 instead of GPT-3.5-Turbo, and Mistral or Gecko instead of GTE-Base)

for better model performance

● Fine-tuning model used for Named Entity extraction to increase relevance and quality

User Research

● Conducting quantitative analyses to determine how accurately the patterns represent the user's

own significant psychological dimensions

● Validating parameter choices (eg. hybrid search parameters) through usage and/or additional

user research

● Narrowing down the broader group of movie enthusiasts to a group that would benefit the most

from this

● So far usability testing was done on Design Prototypes. Usability testing for the app is yet to be

done.

UI Design

● Refined interface design

● Streamlined navigation

● Increased functionality to enhance user engagement and satisfaction

Product Design

● Beyond movies, this methodology might be applicable to other forms of art like novels (which is

another narrative artform) and songs which are written around emotions.

Conclusion

Our project began with a comprehensive research on self-understanding and relevant psychological

characteristics as it relates to movies. We conducted usability studies and qualitative as well as

quantitative research to guide out product design. We continuously iterated our design based on user

feedback and embraced challenges as opportunities for growth. We leveraged cutting edge technologies

such as Retrieval-Augmented Generation using large language models and vector databases to power

the core of our application.

In closing, we extend our gratitude to all stakeholders, research participants, academic advisors and

supporters who have contributed to the success of this project. Together, we have embarked on a

journey of exploration, discovery, and transformation, and we look forward to the continued evolution

and impact of our platform in the years to come.

Contributions

Ankita Shanbhag

● Engineering system design and architecture

● Data selection of movie metadata and database creation of 2500 movies

○ Integration with APIs and external libraries for data collection

○ Data cleaning

○ Named Entity extraction and linking

○ Prompt engineering for metadata

○ Data validation

● Machine learning

○ Pinecone index creation

○ Embedding creation and upsert

○ Pinecone index retrieval

○ RAG + Hybrid search based on patterns

● Back-End Development

○ API creation and documentation

○ On-server embedding generation

○ Clustering, including performance and behavior optimization

○ Some API performance optimization, using pickling etc

● Front-End Development

○ Design of frontend components and flows

○ Error handling

○ Integrating asynchronous processing

○ Backend integration using the public API

○ Basic caching support

● Deployment

○ GCP setting up and scaffolding

○ Containerization using Docker

○ Kubernetes setup and rollout

○ Cluster management and optimization

○ Android app support and Gradle + manifest updates

○ Web domain and DNS configuration

Saurabh Chachra

● Literature review

○ Self-Concept Clarity

○ Categorization Theory

○ Storytelling

● UX Research

○ Understanding the Target User Group

○ Usability Testing

○ Efficacy of Methodology of patterns representing the user

● Product Concept and Design

● UX Design

○ Service Blueprint

○ Design System

○ Prototyping

● Prompt Engineering

Hrishikesh Srinivas Nagaraju

● Researched movie structure and film development to understand key aspects of creating a

movie.

● Created Product Roadmap

○ Estimated timeline (First Half)

○ Created Gantt chart

● Conducted initial GPT API testing:

○ Prompt engineering

○ Understanding API documentation

○ Performance testing the API

○ Contextualizing the API

● Frontend development

○ Built flows for movie protagonist characteristics, Patterns and list of resonating

characteristics

○ Created API request and response formats to integrate with backend

○ API calls and data parsing

● UX Research:

○ Usability study

○ Asking questions during interviews

○ Taking notes

○ Collect data

○ Preprocess data and prepare for analysis

○ Analyzing feedback

○ Recruiting participants

● Created the Economic & Business analysis section

○ Analyzed Pricing Strategies

○ Value networks

○ Regulations

Kinshuk Nigam

● Market Analysis

○ Secondary Research

○ User Reviews for products

○ Are movies relatable to users

● Choosing the Target User

○ Research on user segments

○ Problem prioritization

○ Target user group selection

● Product Value Proposition

● Competitor analysis with direct and indirect competitors

● Product Management

○ Forces of Progress

○ Jobs to be done

○ Product Features

● Project Management

○ Product Roadmap Planning (Second Half)

○ Trello Project Management

○ Leading Scrums for updates, roadblocks and next steps

● MVP scalability:

○ Coded initial GPT API to work with sample 10 movies

● Initial Backend Development:

○ Coded the /getmovies API (now /Get_all_movies)

○ Coded the /getpatterns API (now /Get_movie_patterns)

○ Integrate with Frontend requests

○ Test run APIs

References

1. Aitchison, J. (2012). Words in the mind: An introduction to the mental lexicon. John Wiley &

Sons.

2. Alibaba-NLP/gte-base-en-v1.5 · Hugging Face. (n.d.). Huggingface.co. Retrieved May 3, 2024,

from https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5

3. Black, J., & Barnes, J. L. (2015). Fiction and social cognition: The effect of viewing award-winning

television dramas on theory of mind. Psychology of Aesthetics, Creativity, and the Arts, 9(4), 423.

4. Briggs, J. (n.d.). Getting Started with Hybrid Search | Pinecone. Www.pinecone.io.

https://www.pinecone.io/learn/hybrid-search-intro/

5. Campbell, J. D. (1990). Self-esteem and clarity of the self-concept. Journal of personality and

social psychology, 59(3), 538.

6. Dierdorff EC, Rubin RS. (2015). Research: We’re not very self-aware, especially at work. Harvard

Business Review, March 12.

https://hbr.org/2015/03/research-were-not-very-self-aware-especially-at-work

7. Getting Started. (n.d.). The Movie Database (TMDB).

https://developer.themoviedb.org/reference/intro/getting-started

8. Gurung, R. A., Sarason, B. R., & Sarason, I. G. (2001). Predicting relationship quality and

emotional reactions to stress from significant-other-concept clarity. Personality and Social

Psychology Bulletin, 27(10), 1267-1276.

9. Lewandowski Jr, G. W., & Nardone, N. (2012). Self-concept clarity's role in self–other agreement

and the accuracy of behavioral prediction. Self and Identity, 11(1), 71-89.

10. markriedl. (2024, January 15). markriedl/WikiPlots. GitHub.

https://github.com/markriedl/WikiPlots

11. McKee, R. (2005). Story. Dixit.

12. MTEB Leaderboard - a Hugging Face Space by mteb. (n.d.). Huggingface.co.

https://huggingface.co/spaces/mteb/leaderboard

13. Quora. (n.d.). Do people relate films with their real lives? [Question]. Retrieved from

https://www.quora.com/Do-people-relate-films-with-their-real-lives

14. Reddit. (2011, December 6). What's the opinion on the Truity Enneagram test? [Online forum

post]. Retrieved from

https://www.reddit.com/r/Enneagram/comments/14bbcuo/whats_the_opinion_on_the_truity_

enneagram_test/

15. Reddit. (2020, December 26). PSA: Stop taking the freaking Truity Enneagram test! [Online forum

post]. Retrieved from

https://www.reddit.com/r/Enneagram/comments/kka1l8/psa_stop_taking_the_freaking_truity_

enneagram/

16. Rosch, E. (1978). Principles of categorization. In Cognition and categorization (pp. 27-48).

Routledge.

17. Schneider, M. (2018, February 13). Why films are relatable and create a physical emotional

response. 34th Street Magazine. Retrieved from

https://www.34st.com/article/2018/02/why-films-are-relatable-and-create-a-physical-emotional

-response

18. Schwartz, S. J., Meca, A., & Petrova, M. (2017). Who am I and why does it matter? Linking

personal identity and self-concept clarity. Self-concept clarity: Perspectives on assessment,

research, and applications, 145-164.

19. Wikipedia:Database download. (2021, August 25). Wikipedia.

https://en.wikipedia.org/wiki/Wikipedia:Database_download

Appendix - Market Analysis

Online personality tests can often prove to be intricate and challenging to decipher, presenting a level of

opacity that complicates users' understanding. These assessments typically employ complex algorithms

and psychological frameworks to analyze and categorize individuals based on their responses. However,

the inner workings of these algorithms are often obscured from users, leading to a lack of transparency

in how conclusions about personality traits are reached. Additionally, the nuances of human personality

are vast and multifaceted, making it difficult for any test to capture the full complexity of an individual

accurately. Furthermore, the language used in these tests may be technical or abstract, further

distancing users from a clear comprehension of their results. Consequently, users may find themselves

grappling with interpretations that feel detached from their self-perception, highlighting the challenges

inherent in navigating the intricacies of online personality assessments.

PSA: Stop taking the freaking Truity enneagram test then using the screenshot to ask “What am I??”

Eclectic Energies is your friend. Crunchy, but friendly.

What's the opinion on the Truity enneagram test?

Movies hold a unique place in people's lives as powerful vessels of storytelling that often resonate

deeply with personal experiences and emotions. People frequently find themselves relating their own

lives to the narratives depicted on screen, drawing parallels between the characters' journeys, conflicts,

and triumphs, and their own. Whether it's identifying with a protagonist's struggles, finding solace in

shared themes of love or loss, or seeking inspiration from characters who overcome adversity,

individuals often use movies as a mirror to reflect upon their own circumstances and feelings.

Furthermore, the immersive nature of cinema, with its visual and auditory elements, allows viewers to

immerse themselves fully in the narrative, fostering a sense of connection and empathy with the

characters and their stories. As a result, the experiences and lessons portrayed in movies can profoundly

impact individuals' perceptions of themselves and the world around them, influencing their beliefs,

values, and personal growth.

Do people relate films with their real lives? - Quora

Why Film is the Most Relatable of Content | 34th Street Magazine

The above article discusses the impacts of film on human emotions and connections. Films have a

unique ability to evoke empathy and stir genuine emotions by presenting relatable stories and characters

in a believable manner. Movies offer a deeper exploration of universal experiences and emotions. They

serve as a lens through which viewers can reflect on their own lives and vulnerabilities, providing a richer

and more meaningful form of entertainment and emotional connection.

Appendix - Application Flow

Step 1: The movie dropdown appears based on the user's input, allowing them to search for and select

movies.

Step 2: Users can add and delete movies that resonated with them, creating a personalized list of

favorite films.

Step 3: Users can read about the characteristics of the movie protagonists and select the ones that

resonated deeply with them.

Step 4: Users can view the characteristics they selected as resonating with them, providing a visual

representation of their preferences.

Step 5: Users can get movie recommendations based on the resonated characteristics they chose on the

Home page, tailoring the suggestions to their personal inclinations.

Step 6: Users can get movie recommendations based on the movies they added as favorites on the

Home page. They can click on the movie image to view the overview.

Appendix - Code

●

Reelatable_Movie_Charateristics_Retrieval.ipynb

●

reelatable_movie_recommendations

● https://github.com/AnkitaShanbhag30/flutter_application_reelatable

● https://github.com/AnkitaShanbhag30/flask_application_reelatable

● https://app.swaggerhub.com/apis/AnkitaSureshShanbhag/Reelatable/1.0.0

Appendix - Problem Statement, Vision and Value

Proposition

Problem Statement

Scores of positive psychology practices like mood journaling, meditation, mindfulness, awe-walks, and

reciprocal self-disclosure have been shown to be highly effective at enhancing mental well-being and

beneficial for almost anyone. A number of such practices are aimed at improving self-awareness.

However, a prevalent issue with these practices in the real world is the lack of adoption and

engagement; they do not seamlessly integrate into people’s existing lifestyles and interests.

Vision

Our vision is to weave positive psychology practices into daily life, enhancing well-being by reducing user

friction. Leveraging the flywheel effect, each positive interaction propels further engagement, creating a

sustainable cycle of mental health improvement.

Value Proposition - why users want this now

We offer a personalized movie recommendation engine that recommends movies that ‘resonate’ with

users. This service not only enhances entertainment but also promotes reflection on personal values,

seamlessly integrating introspection with enjoyment for deeper self-awareness and engagement.

Appendix - Product Management

Product Roadmap

The product roadmap outlines the development phases for a movie recommendation app designed to

help users understand themselves better. The roadmap is divided into four phases: Research, Product

Management, Design, and Development. During the Research phase, the team conducted user research