December 1st, 2016

Using Sentiment Analysis to Measure Election Surprise

RSS icon RSS Category: Data Journalism

Sentiment Analysis is a powerful Natural Language Processing technique that can be used to compute and quantify the emotions associated with a body of text. One of the reasons that Sentiment Analysis is so powerful is because its results are easy to interpret and can give you a big-picture metric for your dataset.
One recent event that surprised many people was the November 8th US Presidential election. Hillary Clinton, who ended up losing the race, had been given chances ranging from a 71.4% (FiveThirtyEight), to a 85% (New York Times), to a >99% chance of victory (Princeton Election Consortium).

Credit: New York Times

To measure the shock of this upset, we decided to examine comments made during the announcements of the election results and see how (if) the sentiment changed. The sentiment of a comment is measured by how its words correspond to either a negative or positive connotation. A score of ‘0.00’ means the comment is neutral, while a higher score means that the sentiment is more positive (and a negative score implies the comment is negative).
Our dataset is a .csv of all Reddit comments made during 11/8/2016 to 11/10/2016 (UTC) and is courtesy of /u/Stuck_In_the_Matrix. All times are in EST, and we’ve annotated the timeline (the height of the bars denotes the number of comments posted during that hour):

We examined five political subreddits to gauge their reactions. Our first target was /r/hillaryclinton, Clinton’s primary support base. The number of comments reached a high starting at around 9pm EST, but the sentiment gradually fell as news came in that Donald Trump was winning more states than expected.

/r/hillaryclinton: Number of Comments per Hour


/r/hillaryclinton: Mean Sentiment Score per Hour

What is interesting is the low number of comments made after the election was called for Donald Trump. I suspect that it may have been a subreddit-wide pause on comments due to concerns about trolls, but I’m not sure; I contacted the moderators but haven’t received a response back yet.
A few other left-leaning subreddits had interesting results as well. While /r/SandersforPresident was closed for the election season post-Bernie concession, it’s successor, /r/Political_Revolution, had not closed and experienced declines in comment sentiment as well.

/r/Political_Revolution: Number of Comments per Hour


/r/Political_Revolution: Mean Sentiment Score per Hour

On /r/The_Donald (Donald Trump’s base), the results were the opposite.

/r/The_Donald: Number of Comments per Hour


/r/The_Donald: Mean Sentiment Score per Hour

There are also a few subreddits that are less candidate- or ideology-specific: /r/politics and /r/PoliticalDiscussion. /r/PoliticalDiscussion didn’t seem to show any shift, but /r/politics did seem to become more muted, at least compared to the previous night.

/r/PoliticalDiscussion: Number of Comments per Hour


/r/PoliticalDiscussion: Mean Sentiment Score per Hour
/r/politics: Mean Sentiment Score per Hour

To recap,

  1. Reddit political subreddits experienced a sizable increase in activity during the election results
  2. Subreddits differed in their reactions to the news along idealogical lines, with pro-Trump subreddits having higher positive sentiment than pro-Clinton subreddits

What could be the next steps for this type of analysis?

  1. Can we use these patterns to classify the readership of the comments sections of newspapers as left- or right-leaning?
  2. Can we apply these time-series sentiment analyses to other events, such as sporting events (which also includes two ‘teams’)?
  3. Can we use sentiment analysis to evaluate the long-term health of communities, such as subreddits dedicated to eventually-losing candidates, like Bernie Sanders?

Leave a Reply

H2O LLM DataStudio Part II: Convert Documents to QA Pairs for fine tuning of LLMs

Convert unstructured datasets to Question-answer pairs required for LLM fine-tuning and other downstream tasks with

September 22, 2023 - by Genevieve Richards, Tarique Hussain and Shivam Bansal
Building a Fraud Detection Model with H2O AI Cloud

In a previous article[1], we discussed how machine learning could be harnessed to mitigate fraud.

July 28, 2023 - by Asghar Ghorbani
A Look at the UniformRobust Method for Histogram Type

Tree-based algorithms, especially Gradient Boosting Machines (GBM's), are one of the most popular algorithms used.

July 25, 2023 - by Hannah Tillman and Megan Kurka
H2O LLM EvalGPT: A Comprehensive Tool for Evaluating Large Language Models

In an era where Large Language Models (LLMs) are rapidly gaining traction for diverse applications,

July 19, 2023 - by Srinivas Neppalli, Abhay Singhal and Michal Malohlava
Testing Large Language Model (LLM) Vulnerabilities Using Adversarial Attacks

Adversarial analysis seeks to explain a machine learning model by understanding locally what changes need

July 19, 2023 - by Kim Montgomery, Pramit Choudhary and Michal Malohlava
Reducing False Positives in Financial Transactions with AutoML

In an increasingly digital world, combating financial fraud is a high-stakes game. However, the systems

July 14, 2023 - by Asghar Ghorbani

Ready to see the platform in action?

Make data and AI deliver meaningful and significant value to your organization with our state-of-the-art AI platform.