Using Sentiment Analysis to Measure Election Surprise

Published: December 01, 2016

min read

Written by: H2O.ai Team

Sentiment Analysis is a powerful Natural Language Processing technique that can be used to compute and quantify the emotions associated with a body of text. One of the reasons that Sentiment Analysis is so powerful is because its results are easy to interpret and can give you a big-picture metric for your dataset.
One recent event that surprised many people was the November 8th US Presidential election. Hillary Clinton, who ended up losing the race, had been given chances ranging from a 71.4% (FiveThirtyEight ), to a 85% (New York Times ), to a >99% chance of victory (Princeton Election Consortium ).

Credit: New York Times

/u/Stuck_In_the_Matrix

We examined five political subreddits to gauge their reactions. Our first target was /r/hillaryclinton , Clinton’s primary support base. The number of comments reached a high starting at around 9pm EST, but the sentiment gradually fell as news came in that Donald Trump was winning more states than expected.

/r/hillaryclinton: Number of Comments per Hour

/r/hillaryclinton: Mean Sentiment Score per Hour

/r/SandersforPresident /r/Political_Revolution

/r/Political_Revolution: Number of Comments per Hour

/r/Political_Revolution: Mean Sentiment Score per Hour

On /r/The_Donald (Donald Trump’s base), the results were the opposite.

/r/The_Donald: Number of Comments per Hour

/r/The_Donald: Mean Sentiment Score per Hour

There are also a few subreddits that are less candidate- or ideology-specific: /r/politics and /r/PoliticalDiscussion . /r/PoliticalDiscussion didn’t seem to show any shift, but /r/politics did seem to become more muted, at least compared to the previous night.

/r/PoliticalDiscussion: Number of Comments per Hour

/r/PoliticalDiscussion: Mean Sentiment Score per Hour

/r/politics: Mean Sentiment Score per Hour

Reddit political subreddits experienced a sizable increase in activity during the election results
Subreddits differed in their reactions to the news along idealogical lines, with pro-Trump subreddits having higher positive sentiment than pro-Clinton subreddits

What could be the next steps for this type of analysis?

Can we use these patterns to classify the readership of the comments sections of newspapers as left- or right-leaning?
Can we apply these time-series sentiment analyses to other events, such as sporting events (which also includes two ‘teams’)?
Can we use sentiment analysis to evaluate the long-term health of communities, such as subreddits dedicated to eventually-losing candidates, like Bernie Sanders?

H2O.ai Team

At H2O.ai, democratizing AI isn’t just an idea. It’s a movement. And that means that it requires action. We started out as a group of like minded individuals in the open source community, collectively driven by the idea that there should be freedom around the creation and use of AI.

Today we have evolved into a global company built by people from a variety of different backgrounds and skill sets, all driven to be part of something greater than ourselves. Our partnerships now extend beyond the open-source community to include business customers, academia, and non-profit organizations.

BACK TO LIST