Analyzing my text messages with my ex-boyfriend

by Teresa Ibarra


Message frequency

We began dating in the summer of 2015 and broke up in the spring of 2016.

Total messages from Teresa

Total messages from my ex


Sentiment analysis

Sentiment analysis is the process of computationally identifying and categorizing the emotions expressed in a piece of text as positive, negative, or neutral. NLTK's VADER tools were run on every message that contained text. The following graphs represent the compound score over time.

Combined messages

Teresa's messages

His messages


Longest messages

These were the longest messages we ever sent to each other.

In a way I hoped this would be something spicy like a long, drawn-out argument. But it turns out I just needed help with configuring Homebrew and he was geeking out about his philosophy meetup.

Longest message from Teresa

These are warnings from terminal output due to issues in my Homebrew configuration.

Longest message from him

This is a list of questions for a philosophy meetup group he was attending.



Key words

The following graphs represent how often we sent specific words to each other.


Topics over time

I trained a Biterm topic model to identify 40 recurring topics in our text messages. The model then assigned a topic to every message containing text. The following graphs describe the frequency of a given topic over time.

The model describes a given topic as a list of words. The list is ordered from most to least relevance. I added a possible interpretation of this list for each topic.




Q: Why did you do this?

I thought it would be funny.

Q: How did you get the messages?

These messages were sent over Facebook Messenger. Facebook allows you to download all of the messages you've ever sent, ever, as JSON files. I processed these files with Python. You can see the scripts I wrote for processing in this folder.

Q: How did you perform sentiment analysis?

I used NLTK's VADER tools on every text message. You can see how I did this here.

Q: How did you get the topics?

I trained a Biterm topic model on all of my text messages and had it identify the most likely topic for each message. You can see how I trained and ran the model here.

Q: How did you do the visualizations?

I used Observable Plot to make the charts. This site was generated with Observable Framework. You can find my code for the visualizations here.

Q: What was it like to show your friends this?

One friend in particular was surprised by the data and how it didn't align with their perception of the relationship.

Q: Does your ex know that you did this?

He does! He was the first person I talked to about it. He thought that this turned out amazing... (ಥ﹏ಥ)

Q: Where did this idea come from?

I came up with the idea for this in 2020. Large social media companies hoards data and at the time, it wasn't clear to me what they'd do with it. I wanted to explore the ways in which private companies could analyze who we were from our data. We don't think so much about the data that we consent to sharing and it's hard to conceptualize how data can reveal so much.

Data are biased. The existence, expression, collection, and the presentation of data are all biased. I hope that you question how I actually relate to this data and how I made decisions on this project. I believe we should apply the same scrunity towards all data analysis, artificial intelligence, and machine learning tools.

Q: It's very personal to share this. Why did you do this?

that's art babey!!

Q: That's cool! I wonder what it'd be like to run it on my text messages.

I can't say I recommend it -- it was surreal and uncomfortable to read through messages from a decade ago. Programmatic analysis can reveal things about yourself, your partner, and your relationship that you may not want to know or accept. It's also easy to intentionally or unintentionally manipulate data to favor a narrative.

Q: Can you release this so that people could do this themselves?

I'm considering it. However, the topic model is custom to my data and would not transfer accurately to anyone else's texts.

Q: Was this cathartic?

Not in the way I expected. I wanted to complete this project for a long time and I've told people about it for years. I felt bad that I was never able to do it. Dedicating time to work on this at Recurse Center released me from that shame.


Fun tidbit

When I was training the topic model, some words occurred so frequently that it would negatively influence topic accuracy. Some of the words were 'like', 'think', and 'really', which are fitting for text messages between two Californian 17 year olds.



This project was created by Teresa Ibarra.