Challenges and Opportunities for Sentiment Analysis over Social Media for Dynamic Events such as an Election

Monireh Ebrahimi and Amir Hossein Yazdavar

Previous efforts to assess people’s sentiment on Twitter have suggested that Twitter may be a valuable resource for studying political sentiment and that it reflects the offline political landscape. According to a Pew Research Center report, in January 2016 44% of US adults stated having learned about the presidential election through social media. Furthermore, 24% reported use of social media posts of the two candidates as a source of news and information, which is more than the 15% who have used both candidates’ websites or emails combined (Pew Research Center). The first presidential debate between Trump and Hillary was the most tweeted debate ever with 17.1 million tweets (First Presidential Debate Breaks Twitter Record).

Many opinion mining systems and tools have been developed to provide users with the attitudes of people towards products/people/topics and their attributes/aspects. One of the most often used techniques to gauge the public’s attitude, including its preferences and support, is that of sentiment analysis.  However, sentiment analysis for predicting the result of an election is still a challenging task. Though apparently simple, it is empirically highly challenging to train a successful model for conducting sentiment analysis on tweet streams for an election. Among the key challenges are changes in the topics of conversation and the people on which social media posts express their opinions.  In this blog, we will provide a brief overview of our sentiment analysis classifier and highlight some of the challenges that we have encountered during our monitoring of the presidential election at Kno.e.sis using our Twitris system. We should note here that Twitris-enabled election predictions by Kno.e.sis and the Cognovi Labs team were some of the very few that succeeded while a vast majority of predictions failed (Election Day #SocialMedia Analysis #Election2016 08Nov2016, Cognovi Labs: Twitter Analytics Startup Predicts Trump Upset in Real-Time).

We first created a supervised multi-class classifier (positive vs. negative vs. neutral) for analyzing people’s opinions about different election candidates. To this end, we trained our model for each candidate separately. The motivation for this segregation comes from our observation that the same tweet on an issue can be positive for one candidate while negative for another one. In fact, the sentiment of a tweet is very candidate-dependent. In the first round of training in July 2016 before the convention, we used 10,000 labeled tweets collected for 5 candidates, including Bernie Sanders, Donald Trump, Hillary Clinton, John Kasich, and Ted Cruz on 10 issues encompassing budget, finance, education, energy, environment, healthcare, immigration, gun control, and civil liberties . In addition to excluding re-tweets, tweets were tested for similarity using a ratio of levenshtein distance to ensure that no two tweets were too similar. Afterward, through many experiments over different machine learning algorithms and parameter settings, we found our best model with respect to F-measure. Our best model for Clinton uses SVM with TF-IDF vectorization of 1-3 grams, positive and negative hashtags for each candidate, number of positive and negative words (sentiment score), and achieved 0.66 precision, 0.63 recall, and 0.63 f-measure. Through manual error analysis, however, we noticed the importance of considering more comprehensive features like the number of positive and negative words to avoid some outrageous errors. Therefore, we conducted some experiments with the number of positive words, the number of negative words, and LIWC as features. Surprisingly, these features improved our F-measure by only around 1 %. Apart from that, we also used a distributed vector representation of training instances obtained from a pre-trained word2vec model on Twitter and Google News instead of using a discrete/traditional representation. However, the performance decreased. Finally, we achieved the best performance using CNN: three models—Random initialization, static (word2vec), non-static (fine-tuning word2vec).

Challenges

1. Fast-paced change in dataset

The foremost challenging part is creating a robust classification system to cope with the dynamic nature of tweets related to an election. The election is very active (or dynamic) as everyday people talk about some new aspects of the election and candidates in the context of new events. Therefore, important features used to classify sentiment may soon become irrelevant and new emerging features would be neglected if we did not update the training set regularly. Furthermore, in a political domain, unlike many other domains, people mostly express their sentiment toward the candidates implicitly and without using sentiment words extensively. This phenomenon makes the situation worse and more challenging. Another factor that may exacerbate the problem is differentiating the transient important features from lasting or recurring ones. Those features may disappear and then reappear in the future [1]. In the context of the election, for example, this scenario may happen because of the temporal changes in what each candidate’s supporters talk about. Given this non-stationary characteristic of the election, we may encounter a concept drift/dataset shift problem. That is, learning when the test and training data have a different distribution. In fact, most of the machine learning approaches assume an identical distribution for the training and test set, although the test/target environment changes over time in many real-world problems. This phenomenon is an important factor for selecting our classification model. Among the classification models, SVM is one of the most robust models for dataset shift.
Untitled drawing (2) (1).jpg
Untitled drawing (1) (1).jpg
All of these aforementioned challenges make active learning necessary. In fact, we can seldom apply machine learning to a real-world problem successfully without using active learning. There are two possible models for active learning which are useful to our problem as shown in the figures  below. Both of the above models are expensive because of involving the human in the loop for doing the really labor-intensive and time-consuming task of annotation. Annotation is even more challenging here due to both the short length of tweets and the inherent vagueness in the political tweets. A question may arise concerning why we have not used any unsupervised approach like a lexicon-based approach when the annotation part is so challenging and our annotated dataset becomes obsolete and outdated very fast. The answer is that in political tweets people often do not use many sentiment words; hence, the performance of a lexicon-based method would be low. Empirically, we employed the MPQA subjectivity lexicon [2] to capture the subjectivity of each tweet. However the accuracy of this model did not go beyond 0.49.

Despite the costs, updating the training set regularly was the most effective measure for keeping the classifier reasonably good during the 2016 election. In fact, no matter how well our system may have been working, what worked well up until yesterday could become useless today after a new political event, set of propaganda, or scandal. For example, our first model trained during the primaries performed quite poorly during and after the conventions. Therefore, feeding the dataset with a new training set was the key task for  keeping the system reliable. To do so, we were updating the training data in a timely manner, e.g., every few days. It may also be worth trying to include more important/influential tweets in the training data. To achieve this goal we were collecting the data mostly at specific times, such as during the presidential debates.

2. Candidate-dependence

Most of sentiment analysis tools work in a target-independent manner. However, a target-independent sentiment analyzer is prone to yield poor results on our dataset because post-conventions a huge number of our tweets contain the names of both candidates. “I am getting so nervous because I want Trump to win so bad. Hillary scares me to death and with her America will be over” and “I don't really want Hillary to win but I want Trump to lose can we just do the election over” are examples of such tweets. Based on our observation, about 48% of our instances contained variants of both Clinton’s and Trump’s names. In such cases, the sentiment of those tweets may get misclassified for a given candidate because of the interference of the features related to another candidate. The state-of-the-art approaches for supervised target-dependent sentiment analysis can be grouped into two groups of syntax-based and context-based methods. The first group merely relies on POS tagging or syntax parsing for feature extraction (e.g., [3]), while the second group defines the left and right context for each target [4]. [4] demonstrates that the latter outperforms the former in the classification of informal texts such as tweets. To further enhance performance, sentiment lexicon expansion-related works such as [5] can be used to extract the sentiment-bearing candidate-specific expressions and those expressions can be added to the feature vector of a classifier. In our case, since we have trained one classifier per candidate, we can include the instances containing the name of more than one candidate in the training set of both classifiers. The key is to include features related to the target candidate in its corresponding classifier and exclude the irrelevant one both in the training and testing phase. To do that, we can use either dependency or proximity (similar to the two aforementioned works) to include the on-target features and ignore the off-target ones. Similarly, in the testing phase, depending on the classifier, we should include and exclude some of the features from our feature vector.


3. The importance of identifying the user’s political preference

The ultimate goal of sentiment analysis over political tweet streams is predicting election results. Hence, obtaining some information about the political preference of the users can provide more fine-grained sources of information to a political pundit or analyzer for  insight. Inspired by [6], we have developed a simple but effective algorithm to categorize users into 5 groups of far left-leaning,  left-leaning, far right-leaning, right-leaning, and independent users. The idea behind our approach is the tendency of users to follow others who have a similar political orientation as themselves. The more right/left-leaning a user follows, the more probability of that particular user being right/left-leaning. Therefore, we have collected a set of Twitter users with known political orientation, including all senators, congresspersons, and political pundits. Then, we estimate the probability that a user is left-leaning(right-leaning) by calculating the ratio of left-leaning (right-leaning) followees of a user to his/her total number of followers. Finally, we decide the political preference of a user by comparing the above ratio with a threshold T. Gaining this information about users helps to improve the social media-based prediction of the election.


4. Content-related challenges (hash tags)

Recently, there has been a surge of interest in distant supervision, which is training a classifier on a weekly labeled training set [7,8]. In this method, the training data gets automatically labeled based on some heuristics. In the context of sentiment analysis, using the emoticons :) and :(  and other similar emoticons as a positive and a negative label respectively is one way of using distant supervision. Hashtags are also widely used for different machine learning tasks such as emotion identification [9]. Similarly, people use a plethora of hashtags in their tweets about the election. Besides, as we mentioned before, due to the dynamic nature of election domain, the quality, quantity, and freshness of labeled data plays a vital role in creating a robust classifier. It is therefore desirable to use popular hashtags that each candidate’s supporters use as a weak label in our dataset. However, our analysis for the 2016 election showed that people widely use hashtags sarcastically in the political domain and using popular hashtags for automatic labeling leads to a huge number of incorrectly labeled instances. For example, through the election only 43% of tweets containing #Imwithher were positive for Clinton and it was used sarcastically in 27% of tweets. Furthermore, our experiments show that using those hashtags as a feature for our classifier will also not boost our performance.


5. Content-related challenges (links)

All existing techniques for tweet classifiers rely merely on tweet contents and ignore the content of the documents they point to through a URL. However, based on our observation in the  2016 election, around 36% of of tweets contain a URL to an external link. Similarly in the  2012 election, [6] demonstrates that 60% of tweets from very highly engaged users contain URLs. Those links are crucial as without them often the tweet is incomplete and inferring the sentiment is impossible or difficult even for a human annotator. Therefore, our hypothesis is that incorporating the content, keywords or title of the documents that a URL points to as a feature will cause a significant gain in our performance intuitively. To the best of our knowledge, there is no work on tweet classification that expands tweets based on their URLs. However, link expansion has successfully been applied to other problems such as topical anomaly detection [10] and distant supervision [11].  

6. Content-related challenges (sarcasm)

Based on our observation 7% of Trump’s tweets and 6% of Clinton’s tweets are sarcastic. Among these sarcastic tweets, 39% and 32% of them have been classified incorrectly by our system. To date, many sophisticated tools and approaches have been proposed to deal with sarcasm. Looking closer at these works, they mostly focus only on detecting the sarcasm in the text and not on how to cope with it in the sentiment analysis task. This, therefore, raises the interesting question about how sarcasm may or may not affect the sentiment of the tweets and how to deal with sarcastic tweets in both the training and prediction phases. Ellon Rillof et al [12] has proposed an algorithm to recognize the common form of sarcasm which flips the polarity in the sentence. These kinds of polarity-reverser sarcastic tweets often express the positive (negative) sentiment in the context of a negative (positive) activity or situation. However, Maynard et al [13] show that determining the scope of sarcasm in tweets is still challenging. In fact, the polarity of sarcasm may apply to part of a tweet or its hashtags but not necessarily the whole. As a result, dealing with sarcasm in the task of sentiment analysis is an open research issue worth more work. In terms of the training set, our hypothesis is that excluding the sarcastic instances from the training set will remove the noise and improve the quality of our training set.


7. Interpretation-related challenges (Sentiment Analysis versus Emotion Analysis)

Study of sentiment has evolved to the study of emotions, which has finer granularity. Positive, negative, and neutral sentiments can be expressed with different emotions such as joy and love for positive polarity; anxiety and sadness for negative; and apathy for neutral sentiment. Our emotion analysis on who tweeted #IVOTED in the 2016 US presidential election  reveals Trump followers were joyful about Trump on election day. Though the sentiment analysis favored Hillary in the early hours, emotion analysis was showing support for Trump (joy). In fact, emotion is a better criterion for predicting people’s action like voting and usually there are huge emotional differences in the tweets which belong to the same polarity. Hence, emotion analysis should  inevitably be a component of an election prediction tool.


8. Interpretation-related challenges (Vote counting vs engagement counting)

Most/all of the aforementioned challenges affect the quality of our sentiment analysis approach. It is also very important to correlate a user’s online behavior and opinion with their actual vote. Lu et al [6] show the more important role of highly engaged users in the result prediction of the 2012 election  . There are two plausible explanations for this. First, the more a user tweets, the more reliably we can predict his/her opinion. Second, highly active people are usually more influential and more likely to actually vote in the real world. That is why an election monitoring system should report both user-level normalized sentiment in addition to a tweet-level one. It is the end user analyzer’s task to consider both of these factors in prediction.


9. The importance of location

An application that predicts the election result must consider each state’s influence in the election by means of the number of electoral votes for that state. Many tools and approaches have been developed for both fine-grained [14] and coarse-grained (Twitris) location identification in tweets for different purposes such as disaster management (Hazards SEES: Social and Physical Sensing Enabled Decision Support for Disaster Management and Response) and election monitoring. In the latter case, the geographic location of a tweet or the user location in the profile can be used to estimate the user’s approximate location. During the 2016 election , the spatial aspect of our Twitris system played a crucial role in assisting the end users in predicting the election.

10. Trustworthiness-related challenges (Bots)

What happens when a large number of participants in a conversation are biased robots which artificially inflate social-media traffic by manipulating public opinion and spreading political misinformation? A social bot is a computer algorithm that automatically generates content over social media and is trying to emulate and possibly change public attitude. For the past few years, social bots have inhabited social media platforms. Our analysis demonstrates that a large portion of tweets which were pro-Trump and pro-Clinton during the first and second debates originated from automated accounts (How Twitter Bots Are Shaping the Election, How the Bot-y Politic Influenced This Election). Indeed, we have witnessed bot wars during the election.  Recently, pinpointing the sources of bots has attracted many researchers. Supervised statistical models is done utilizing a different feature set from network features: retweets, mentions, and hashtag co-occurrence [15] to user features (i.e. language, geographic locations, account creation time, number of followers, followees), posts [16], and timing features (i.e. content generation and consumption, by measuring tweet rate and inter-tweet time distribution [17]). Content features are based on natural language cues measured via linguistic analysis, e.g. part-of-speech tagging [18]. At Kno.e.sis, by examining the resource that generates a tweet (checking whether it is originating from an API or not), our system has found bots in support of both Trump and Clinton with a high precision but low recall. A more sophisticated approach is needed to improve the recall.




References

[1]
Pinage, Felipe Azevedo, Eulanda Miranda dos Santos, and João Manuel Portela da Gama. "Classification systems in dynamic environments: an overview." Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 6.5 (2016): 156-166.

[2]
Wilson, Theresa, Janyce Wiebe, and Paul Hoffmann. "Recognizing contextual polarity in phrase-level sentiment analysis." Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, 2005.

[3]
Jiang, Long, et al. "Target-dependent twitter sentiment classification." Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011.

[4]
Vo, Duy-Tin, and Yue Zhang. "Target-Dependent Twitter Sentiment Classification with Rich Automatic Features." IJCAI. 2015.

[5]
Chen, Lu, et al. "Extracting Diverse Sentiment Expressions with Target-Dependent Polarity from Twitter." Sixth International AAAI Conference on Weblogs and Social Media. 2012.

[6]
Chen, Lu, Wenbo Wang, and Amit P. Sheth. "Are Twitter users equal in predicting elections? A study of user groups in predicting 2012 US Republican Presidential Primaries." International Conference on Social Informatics. Springer Berlin Heidelberg, 2012.

[7]
Purver, Matthew, and Stuart Battersby. "Experimenting with distant supervision for emotion classification." Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2012.

[8]
Go, Alec, Richa Bhayani, and Lei Huang. "Twitter sentiment classification using distant supervision." CS224N Project Report, Stanford 1.12 (2009).

[9]
Wang, Wenbo, et al. "Harnessing twitter" big data" for automatic emotion identification." Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Conference on Social Computing (SocialCom). IEEE, 2012.

[10]
Anantharam, Pramod, Krishnaprasad Thirunarayan, and Amit Sheth. "Topical anomaly detection from twitter stream." Proceedings of the 4th Annual ACM Web Science Conference. ACM, 2012.

[11]
Magdy, Walid, et al. "Distant Supervision for Tweet Classification Using YouTube Labels." ICWSM. 2015.

[12]
Riloff, Ellen, et al. "Sarcasm as Contrast between a Positive Sentiment and Negative Situation." EMNLP. Vol. 13. 2013.

[13]
Maynard, Diana, and Mark A. Greenwood. "Who cares about Sarcastic Tweets? Investigating the Impact of Sarcasm on Sentiment Analysis." LREC. 2014.

[14]
Ji, Zongcheng, et al. "Joint Recognition and Linking of Fine-Grained Locations from Tweets." Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2016.

[15]
Lee, Kyumin, Brian David Eoff, and James Caverlee. "Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter." ICWSM. 2011.

[16]
Kramer, Adam DI, Jamie E. Guillory, and Jeffrey T. Hancock. "Experimental evidence of massive-scale emotional contagion through social networks." Proceedings of the National Academy of Sciences 111.24 (2014): 8788-8790.

[17]
Yang, Zhi, et al. "Uncovering social network sybils in the wild." ACM Transactions on Knowledge Discovery from Data (TKDD) 8.1 (2014): 2.

[18]
Davis, Clayton Allen, et al. "BotOrNot: A system to evaluate social bots." Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, 2016.

Bots in the Election

In the Kno.e.sis Center at Wright State University, we continue to refine our Twitris technology (Licensed by Cognovi Labs LLC) for collective social intelligence to analyze social media (esp. Twitter) in real time. Kno.e.sis and Cognovi Labs teamed up with the Applied Policy Research Institute (APRI) early in the year and created some tools to monitor the debates. See press coverage on TechCrunch. From the time that we first began following the nominees on Twitter, one thing became clear: Donald Trump was considerably more popular than his competition during the primaries as well as the general election. To be honest, I had never considered the possibility that social bots may have been playing a role in this popularity.


After the conclusion of the first debate all parties who had watched our "Debate Dashboard" were shocked not just by the volume of tweets but by the sentiment and emotion that appeared more positive for Trump than Clinton. When we investigated news from major media outlets, we became more and more concerned that our tool had some serious flaws. Due to articles we had seen discussing the large support Trump had on Twitter, we decided to focus on sentiment. Up until a few days before the election, we continued to update and improve our sentiment analysis algorithm.   


Notwithstanding the improvements in precision to our sentiment classifier, we continued to see Trump as the clear leader. As the debates came and went, our data remained consistent. We began an urgent quest for reason. We added gender analysis because media outlets were telling us that women were down on Trump and would be a major force in the election. Our analysis did not show this, despite having 96% precision in determining female and male users. We developed a proprietary process to separate users into left-leaning and right-leaning users. We could even say whether a user was strongly or loosely associated with a particular political party. Unfortunately, analyzing the data based on political association didn't help either. Surprisingly many strongly left leaning users were anti-Hillary, just a bit behind the right-leaning users.


After the second debate, we began to see many articles pop up about social bots. Once we began to look more into the issue, we found many articles from early in the year talking about Trump's "Bot Army" (Trumps' Biggest Lie? The Size of His Twitter Following). We had our aha moment. In that article, there is a reference to The Atlantic's use of a tool called BotOrNot. We decided to attempt to use this or some similar tool during the last debate to remove users with bot accounts and analyze the remaining data.


BotOrNot is a tool developed at the University of Indiana, Bloomington with collaboration from the University of Southern California, Marina Del Rey. Their tool creates over one thousand metrics by looking at the user account and analyzing retweets, hashtags, metadata, etc. The tool performed extremely well in the DARPA Twitter Bot Challenge, having correctly identifying all of the known bots (though it did incorrectly mark some additional users as bots). We were excited to learn that they had made their tool available through an API endpoint and decided to run our tweets through the system to test the speed at which we could process users. Twitris at this point was processing nearly 35 tweets per second for the election analysis alone, and it was clear very quickly that their service would not be able to handle the volume of data we would be consuming.


Though the final presidential debate was only several days away, we still had hope that we would find an answer. We saw Prof. Philip Howard, from the University of Oxford, mention in The Washington Post that they considered any user that tweets more than 50 times in one day a bot. It would be relatively simple to create an index of users and simply increment the count per tweet and then check the index quickly as the tweets roll through. We may have done this if our team had been free at the time. Some were working on bug fixes, others on improving sentiment, and still others were working to fix some infrastructure issues that we were experiencing at the time. Our corpus of tweets for the election campaign was on its way to exceeding 60 million tweets. A robust implementation would have required more time than anyone had to offer.


I suppose that now is a better time than never to define what a "bot" is. Phil Howard says, in that Washington Post article mentioned above, that "A Twitter bot is nothing more than a program that automatically posts messages to Twitter and/or auto-retweets the messages of others". I personally like this definition, but it leaves a little wiggle room where tools like BotOrNot, which focus primarily on the user accounts, are considered. A Twitter user can be a real, living, breathing, human being and still exhibit bot-like tweeting habits. I think that this is the reason that Howard's group settled on the 50-tweets-per-day stat instead of relying on a classifier. A user can tweet for themselves part of the time, but still have some automation process that tweets certain things on their behalf. There are many reasons that someone would do this, to increase their Klout score (imagine a LuLaRoe seller, YouTuber, or Blogger), for example. There are many companies that you can pay to automate this kind of activity for you. Some, like Linkis' "Convey" (more on this later) work by finding influential tweets and tweeting them on your behalf. These tweets are fairly easy to spot because they attach "via @c0nvey" to the end of the original tweet.


In the end, what we did was to develop a system that is able to quickly and accurately weed out tweets that were not authored by humans, even if the user account is owned by an actual human. Let's see how well our system stacks up to the BotOrNot service. We collected all of the bots found over one fifteen minute period and ran them through BotOrNot. 67.16% of the accounts from the tweets we labeled were determined by BotOrNot to be “bot” owned accounts. I was a little disappointed by this, so I took a look at the users that BotOrNot dismissed as human. In the first pass, I decided to apply Howard's 50-tweets-per-day rule. Doing this increased the percentage of accurately labeled bot tweets to 73.88%. One of the screen names found among this group actually contained the word "bot" in it. The most important thing for us at this point was to make sure that we weren’t getting a lot of false positives. We looked at each account one by one; we looked at each tweet in our system, both those classified as "likely bot" and "likely human".


 
Figure: Real-time labeling of bot and human tweets in Twitris




Many of the users with a low tweet-per-day usage had a combination of the two, and looking at the tweets that were labeled "likely bot", nearly all of them contained "via @c0nvey". Digging a bit further, I found that the Convey service has been accused of tweeting on behalf of unwitting users in the past. Here's a thread from one Twitter user talking about his experience:
   c0nvey.jpg


So, our system is able to accurately detect automated (bot) tweets despite the fact that the user is not a bot. Our average detection rate also looks reasonable compared to other services. We experience bot traffic at a rate of about a 5% per day in our election campaign, though there are days that get to nearly 8%.


Moving Forward (oh, and Fake News)


Elections come and go. Once they are gone, we seek to apply our findings to future work. Being able to detect bots is great; but, what else can we do with that? Well, post-election we have learned the term “fake news” (with content that was entirely made up, not grounded in truth or reality), something that not too many people were concerned about before: see the Google Trend.
fakenews.jpg
Where does all of this fake news come from? There have been troves of news reports blaming Facebook and Twitter for altering the outcome of the election. Obviously, Facebook and Twitter themselves weren’t creating pro-Trump news (Facebook in particular was accused of killing pro-Trump trends); however, some blame these companies for not alerting people that the news that they were “allowing” to spread was fake. I don’t think that that is fair. These companies are reliant on the fact that they don’t publish news (for more, read this article on Facebook, the “News Feed”, and the Communications Decency Act 230) to avoid lawsuits.


During the evaluation of our bot detection system, we noticed something interesting. A large majority of the bot-labeled tweets contained links to dubious looking news stories. Because of our ability to identify these bot tweets, we can exclude them from analysis when considering a brand-centered campaign (like Samsung during the Note 7 battery “situation”). I think that it is important to note that, despite the elimination of “fake news” and “bot tweets” from our analysis, Trump was always winning. We saw the same thing with the Brexit referendum earlier this year, where Twitris helped us to correctly predict the Brexit outcome before the polls closed. There is clear evidence to support the fact that high tweet volume translates to success (except Bernie Sanders, but there may be, uh-hum, other reasons for that). It seems to me that for bots and “fake news” to have swayed the election, they would have needed to be ready to go as soon as Trump announced that he was running, but what we have seen is that he was always ahead.


We will continue to find new ways to leverage everything that we learned from the 2016 Election. If you want to stay up to date on our analysis, please sign up to receive CognoviLabs’ newsletter at www.CognoviLabs.com or join Kno.e.sis on FB, and while you are there check the other post-election analysis.

Subjectivity — Tapping All the Valuable Insights beyond Sentiment for Nextgen Information Extraction

The information in text can be generally divided into two categories: objective information and subjective information. Objective information encompasses the facts about something or someone, while subjective information is about someone's personal experiences. For example, the fact that it is raining is objective, while how one feels about the rain is subjective; that you have not had breakfast is objective, while your feeling of hunger is subjective; that you watched “You Can Count on Me” last night is objective, while your emotion of heartwarming because of the movie is subjective.  

Subjective information about what people think and how people feel is useful for all parties including individuals, businesses, and government agencies during their decision-making processes. The traditional way of collecting subjective information takes the form of surveys, questionnaires, polls, focus groups, interviews, etc. For example, individuals ask their friends which cell phone carriers they recommend and whether the coverage is good in their area; retailers conduct a focus group to have in-depth discussions with their target customers about how they feel regarding shopping in the stores; governments solicit public opinions on particular policy issues via surveys.

The web and social media have changed the way we communicate and provide new potentially powerful avenues for us to glean useful subjective information from user generated content such as blogs, forum posts, reviews, chats, and microblogs. However, much of the useful subjective information is buried in ever-growing user generated data, which makes it very difficult (if not impossible) to manually capture and process the information needed for various purposes. To address the information overload, it is essential to develop techniques to automatically discover and derive high-quality (i.e., contextually or application relevant and accurate) subjective information from user generated content.

Current subjectivity and sentiment analysis efforts have been focused on classifying the text polarity, specifically, whether the expressed opinion for a specific topic in a given text (e.g., document, sentence, word/phrase) is positive, negative, or neutral. This narrow definition considers subjective information and sentiment as the same object, while other types of subjective information (e.g., emotion, intent, preference, expectation) are either not taken into account, or are handled similarly without sufficient differentiation. This limitation may prevent the exploitation of subjective information from reaching its full potential.  

At Kno.e.sis, we extend the definition of subjective information and develop a unified framework that captures the key components of diverse types of subjective information. We define a subjective experience as a quadruple (h, s, e, c), where h is an individual who holds the experiences, s is a stimulus (or target) that elicits the experiences, e.g., an entity or an event, e is a set of expressions that are used to describe the subjective experiences, e.g., the sentiment words/phrases or the opinion claims, and c is a classification or assessment that characterizes or measures the subjectivity. Accordingly, the problem of identifying different types of subjective information can all be formulated as a data mining task that aims to automatically derive the four components of the quadruple from text, as illustrated in Table 1


Table 1 Components of sentiment, opinion, emotion, intent, preference and expectation.
Subjective Experience Holder h Stimulus s Expression e Classification c
Sentiment an individual who holds the sentiment an object sentiment words and phrases positive, negative, neutral
Opinion an individual who holds the opinion an object opinion claims (may or may not contain sentiment words) positive, negative, neutral
Emotion an individual who holds the emotion An event or situation emotion words and phrases, description of events/situations anger, disgust, fear, happiness, sadness, surprise, etc.
Intent an individual who holds the intent an action expressions of desires and beliefs depending on specific tasks
Preference an individual who holds the preference a set of alternatives the expressions of liking, disliking or preferring an alternative depending on specific tasks
Expectation an individual who holds the expectation an object expressions of beliefs about someone or how something will be depending on specific tasks

Consider the following example:

“Action and science fiction movies are usually my favorite, but I don't like the new Jurassic World. Mad Max: Fury Road is the best I've seen so far this year. It's a magnificent visual spectacle and the acting is stellar too. I cried, laughed and smiled watching Inside Out. It was so touching. Would like to watch the new Spy movie this weekend. I hope it’s good!''

The traditional sentiment analysis would find positive opinion about action and science fiction movies, “Mad Max: Fury Road,” “Inside Out” and “Spy,” and find negative opinion about the movie “Jurassic World.” However, if we consider different types of subjective information, and handle each particular type based on the framework we proposed, we will be able to derive much richer information from the text, as illustrated in Table 2.

Table 2 Information that can be extracted from the example text.
Subjective Experience Holder h Stimulus s Expression e Classification c
Preference the author movie genres “favorite” prefer action and science fiction movies over other types of movies
Sentiment the author movie Jurassic World “don’t like” negative
Opinion the author movie Mad Max: Fury Road, visual effect, performances “best”, “magnificent”,“spectacle”, “stellar” positive
Emotion the author movie Inside Out “cried”, “laughed”, “smiled”, “touching” sadness, joy, touching
Intent the author movie Spy “would like to” transactional
Expectation the author movie Spy “hope” optimistic

Figure 1 depicts the process of subjective information extraction. At the beginning, a number of preprocessing steps are needed to handle the raw textual data before the information extraction can take place. Common preprocessing steps include sentence splitting, word tokenization, syntactic parsing or POS tagging, and stop words removal. Afterwards, an optional step is to detect the subjective content from the input text, such as classifying the sentences into subjective or objective categories. The subjective content can be further classified into different types, e.g., sentiment, emotion, intent and expectation. Language resources such as WordNet, Urban Dictionary, and subjectivity lexicons (e.g., MPQA, SentiWordNet) can be used for the subjectivity classification task.

Figure 1 An overview of subjective information extraction.

The next step is to extract the four components of subjective experiences, including the holder, the stimulus or target, the set of expressions, and the classification category or assessment score. Depending on the type of subjective information, specific techniques need to be developed and applied. For example, the target of sentiment is usually an entity, and thus entity recognition is used to extract sentiment target; while the target of intent can be an action, e.g., “to buy a new cell-phone”, thus we need to develop techniques to extract actions from text. In addition, for the same type of subjective information, different classification/assessment schema and techniques may need to be developed according to the purpose of application. For example, many sentiment analysis and opinion mining systems classify the polarity of a text (e.g., a movie review, a tweet) as positive, negative or neutral [1-3], or rate it on a 1-5 stars rating scale [4,5]. Some emotion identification systems focus on classifying emotions into six basic categories: anger, disgust, fear, happiness, sadness, and surprise [6], while some other systems define their own set of emotions based on the application purpose, e.g., understanding emotions in suicide notes [7], identifying emotions that people express using cursing words [8], classifying emotional response to TV shows and movies [9]. Existing work on detecting users' query intent classifies search queries into three categories: navigational, informational, or transactional [10,11]. Studies on identifying purchase intent (PI) for online advertising classify users' posts into PI or Non-PI [12], or information seeking or transactional [13].

Finally, the extracted subjective information can be used for a wide variety of applications, including but not limited to business analytics, Customer Relationship anagement (CRM), marketing, predicting the financial performance of a company, targeting advertisement, recommendation (based on users' interest and preference), monitoring social phenomena (e.g., social tension, subjective well-being), and predicting election results.  

At Kno.e.sis,  we have developed automatic methods to extract components of different subjective experiences. We have proposed an optimization-based approach that extracts a diverse set of sentiment-bearing expressions, including formal and slang words/phrases, for a given target from an unlabeled corpus [2]. We have developed a clustering approach that identifies opinion targets (product features and aspects) from plain product reviews [14]. The proposed approach identifies features and clusters them into aspects simultaneously. Furthermore, it extracts both explicit and implicit features and does not require seed terms. We have also explored the classification and assessment of different types of subjective information. In particular, we have explored supervised methods for emotion classification [6-9]. We have proposed methods to group opinion holders based on their political preference and participation in the discussion about election candidates on Twitter, and assess their sentiments towards the candidates to predict the election results [15]. In order to understand the effect of religiosity on happiness, we analyzed the tweets and networks of more than 250k U.S. Twitter users who self-declared their religious beliefs, and examined the pleasant/unpleasant emotional expressions in users' tweets to estimate their subjective well-being [16,17].



References:

[1] Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. "Thumbs up?: sentiment classification using machine learning techniques." EMNLP. 2002.
[2] Lu Chen, Wenbo Wang, Meenakshi Nagarajan, Shaojun Wang, Amit Sheth. Extracting Diverse Sentiment Expressions with Target-dependent Polarity from Twitter. ICWSM. 2012.
[3] Cícero Nogueira dos Santos, and Maira Gatti. "Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts." COLING. 2014.
[4] Ganu, Gayatree, Noemie Elhadad, and Amélie Marian. "Beyond the Stars: Improving Rating Predictions using Review Text Content." WebDB. Vol. 9. 2009.
[5] Sharma, Raksha, et al. "Adjective Intensity and Sentiment Analysis." EMNLP. 2015.
[6] Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit Sheth. Harnessing Twitter "Big Data" for Automatic Emotion Identification.SocialCom. 2012.
[7] Wenbo Wang, Lu Chen, Ming Tan, Shaojun Wang, Amit Sheth. Discovering Fine-grained Sentiment in Suicide Notes. Biomedical Informatics Insights (BII). 2012.
[8] Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit Sheth. Cursing in English on Twitter. CSCW. 2014.
[10] Jansen, Bernard J., Danielle L. Booth, and Amanda Spink. "Determining the informational, navigational, and transactional intent of Web queries." Information Processing & Management 44.3: 1251-1266. 2008.
[11] Hu, Jian, et al. "Understanding user's query intent with wikipedia." Proceedings of the 18th international conference on World wide web. ACM, 2009.
[12] Gupta, Vineet, et al. "Identifying Purchase Intent from Social Posts." ICWSM. 2014.
[13] Nagarajan, Meenakshi, et al. "Monetizing user activity on social networks-challenges and experiences." Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology-Volume 01. IEEE Computer Society, 2009.
[14] Lu Chen, Justin Martineau, Doreen Cheng and Amit Sheth. Clustering for Simultaneous Extraction of Aspects and Features from Reviews. NAACL. 2016.
[15] Lu Chen, Wenbo Wang, Amit Sheth. Are Twitter Users Equal in Predicting Elections? A Study of User Groups in Predicting 2012 U.S. Republican Presidential Primaries. Proceedings of the 4th International Conference on Social Informatics (SocInfo). 2012.
[16] Lu Chen, Ingmar Weber and Adam Okulicz-Kozaryn. U.S. Religious Landscape on Twitter. Proceedings of the 6th International Conference on Social Informatics (SocInfo), 2014.
[17] Lu Chen. “Mining and Analyzing Subjective Experiences in User Generated Content.” Ph.D. Dissertation. Department of Computer Science & Engineering. [Dayton]: Wright State University; 2016. p. 161.