Who would win the battle for the White House to become the next President of the United States was a topic of hot debate in 2012.
Much of that debate was taking place online, with plenty of people blogging, tweeting or updating social media with their thoughts on Mitt Romney versus Barack Obama.
This provided us with a rich source of information about what people were thinking and feeling about the election race. So today I've decided to cover Techniques of Digital Data Analysis that are used to predict the US election. And perhaps the 2012 election will be remembered as the first election where big data analysis played a crucial role and had a tremendous impact on the outcome of the presidential election.
I am fairly familiar with the above mentioned techniques, because I had an opportunity to meet the CEO of EMC company on January 2013 in Singapore. EMC was one of a selected few companies that Twitter had entrusted to syndicate and provide access to the full Twitter feed for use in internal analytics applications for Obama's campaign in 2012. In my humble opinion that was the reason that in 2015 this company was sold to Dell for $67B in largest deal in Tech history.
The techniques of big data analysis remain the same, so let’s jump to year 2016 and see what social media data is used to predict the US election nowadays.
What Does Big Data Look Like?
- 293,000 statuses are updated / minute
- 31,25 million messages are sent / minute
- 440,640 new tweets go online / minute
However, any data stored (posted, sent, etc.) in the Internet is messy and noisy, that's why we need process the data to get value from it.
Processing the data
- Data: raw data that has not been processed for use
- Information: data that are processed to be useful; provides answers to "who", "what", "where", and "when" questions
- Knowledge: application of data and information; answers "how" questions
- Understanding: appreciation of "why"
- Wisdom: evaluated understanding
Text Mining and Sentiment Analysis
The first technique that is used to find out what we are thinking and feeling about the election race is called “Text Mining and Sentiment Analysis”. The objective is to classify to tweets as Positive, Neutral, or Negative by analyzing each word (or emoticon) in your post.
- Overall candidate sentiment level
- Clinton: 48%
- Trump: 48%
- Positive and negative terms
- Clinton: 5,522 positive, 6,098 negative
- Trump: 3,254 positive, 3,550 negative
But, wait a minute, what about the people who has no posts about presidential candidates?
In 2009, Facebook introduced a button that allowed people to give feedback to their friends’ posts. Facebook called it Like, and people liked it a lot.
On February 24, 2016, Facebook launched Facebook Reactions, which allows users to respond to posts with multiple reactions in addition to "liking" it.
So, how can we use Facebook Reactions to find out what other people think about the US president candidates?
Assume you hate Donald Trump and you post some negative thoughts about him online. Can we predict your friends’ thoughts based on their likes or other reactions? If your answer is yes, then you're absolutely right.
Now it is required to find some "juicy" information about the audience including genders, ages, locations and interests. That’s it!