Return to page

BLOG

Using Sentiment Analysis to Measure Election Surprise

 headshot

By H2O.ai Team | minute read | December 01, 2016

Category: Data Journalism
Blog decorative banner image

Sentiment Analysis  is a powerful Natural Language Processing  technique that can be used to compute and quantify the emotions associated with a body of text. One of the reasons that Sentiment Analysis is so powerful is because its results are easy to interpret and can give you a big-picture metric for your dataset.
One recent event that surprised many people was the November 8th US Presidential election. Hillary Clinton, who ended up losing the race, had been given chances ranging from a 71.4% (FiveThirtyEight ), to a 85% (New York Times ), to a >99% chance of victory (Princeton Election Consortium ).
prediction_comparisons 

Credit: New York Times


/u/Stuck_In_the_Matrix
politics_counts2


We examined five political subreddits to gauge their reactions. Our first target was /r/hillaryclinton , Clinton’s primary support base. The number of comments reached a high starting at around 9pm EST, but the sentiment gradually fell as news came in that Donald Trump was winning more states than expected.
hillaryclinton_counts 

/r/hillaryclinton: Number of Comments per Hour

hillaryclinton_sentiment

/r/hillaryclinton: Mean Sentiment Score per Hour


/r/SandersforPresident/r/Political_Revolution
political_revolution_counts

/r/Political_Revolution: Number of Comments per Hour

political_revolution_sentiment

/r/Political_Revolution: Mean Sentiment Score per Hour


On /r/The_Donald  (Donald Trump’s base), the results were the opposite.
the_donald_counts 

/r/The_Donald: Number of Comments per Hour

the_donald_sentiment

/r/The_Donald: Mean Sentiment Score per Hour


There are also a few subreddits that are less candidate- or ideology-specific: /r/politics  and /r/PoliticalDiscussion . /r/PoliticalDiscussion didn’t seem to show any shift, but /r/politics did seem to become more muted, at least compared to the previous night.
politicaldiscussion_counts 

/r/PoliticalDiscussion: Number of Comments per Hour

politicaldiscussion_sentiment

/r/PoliticalDiscussion: Mean Sentiment Score per Hour
politics_sentiment
/r/politics: Mean Sentiment Score per Hour

  1. Reddit political subreddits experienced a sizable increase in activity during the election results
  2. Subreddits differed in their reactions to the news along idealogical lines, with pro-Trump subreddits having higher positive sentiment than pro-Clinton subreddits

What could be the next steps for this type of analysis?

  1. Can we use these patterns to classify the readership of the comments sections of newspapers as left- or right-leaning?
  2. Can we apply these time-series sentiment analyses to other events, such as sporting events (which also includes two ‘teams’)?
  3. Can we use sentiment analysis to evaluate the long-term health of communities, such as subreddits dedicated to eventually-losing candidates, like Bernie Sanders?
 headshot

H2O.ai Team

At H2O.ai, democratizing AI isn’t just an idea. It’s a movement. And that means that it requires action. We started out as a group of like minded individuals in the open source community, collectively driven by the idea that there should be freedom around the creation and use of AI.

Today we have evolved into a global company built by people from a variety of different backgrounds and skill sets, all driven to be part of something greater than ourselves. Our partnerships now extend beyond the open-source community to include business customers, academia, and non-profit organizations.