Text analysis has traditionally focused on the content of text and establishing themes and patterns within it. While this is beneficial to social researchers and policymakers it has its’ limitations. A fairly recent development in the field of text analysis thanks to advancements in programming is sentiment analysis (aka opinion mining). The aim of sentiment analysis is to calculate the sentiment of response and determine it is positive or negative. In this post, we will explore what sentiment analysis is, why do it and look at how it is done.
What is sentiment analysis?
In a recent article, Alessia D’Andrea and others defined sentiment analysis as the study of peoples attitudes, evaluations, appraisals and emotions. Sentiment analysis is a tool that measures the positive and negative language within the text and provides a score allowing the researcher to draw conclusions in relation to the sentiment of the respondent.
Why do sentiment analysis?
Sentiment analysis is often used by marketing researchers to reveal and measure customer opinions and attitudes towards a brand, product or campaign. With the increase of user and public generated opinions in part to the internet, Alessia D’Andrea and others believe that politicians, service providers and various other actors need to do sentiment analysis in order to effect better decision making. It can be a cost-efficient and effective way to quickly measure public opinion and attitude to different issues.
Bing Liu believes that opinions are so important that when a decision needs to be made we want to hear the opinions of others, this applies to individuals as well as organisations.
In a recent project, we applied sentiment analysis to the general comments section of a consultation survey and found that the general sentiment was positive. Further analysis based on demographic data found the sentiment of one group was significantly negative, one group was significantly positive while the sentiment of the other demographic groups was generally positive. This provides additional data which decisionmakers can factor into their decision-making process.
How to do sentiment analysis
One of the challenges is that one’s opinions are often buried in long text comments. This makes it difficult to identify, extract and summarise the opinions into a useable format, on top of this is the sheer volume of text that one has to analyse. A recent project had just over 530 general open text comments. Doing this manually would be very time consuming and make one’s eyes go square. Luckily there is now a number of programs which will do this analysis using Natural Language Processing (NLP).
Computer programs use coded dictionaries and learnt sentiment from NLP to apply scores to words. Positive words are scored with a +1 while negative words are given a -1. In the example below the words helpful and easy are positive so they are each scored a +1 while the word confusing is a negative word so scores a -1. The total score is added up and averaged out to provide an average sentiment score which in the example is 0.3 meaning there is a positive sentiment towards the textbook. It can be helpful to note the range of the total scores as this can provide additional context, in the above example the range was -7 to 7.
Through the magic of computer coding, a lot of the analysis programs are able to recognise negation within a text. A classic example is “I support this idea” in which the word support would be coded as a positive; in the comment “I don’t support this” the word don’t negates the word support so the word don’t will be coded as a negative and the word support will remain uncoded.
Sentiment analysis can be carried out over a document as a whole, at sentence level or based on a particular feature. In consultation and engagement surveys it is likely the analysis will be focused at the sentence level.
Limitations of sentiment analysis
Just like any form of automatic analysis sentiment analysis programs have their limitations. The human language is a very complex full of slang, cultural variations and misspellings. A good example of this complexity is the second property some have; is it a holiday home, a bach or a crib?
With the growth of social media modern language is full of hashtags, acronyms like LOL, WTF and BTW and emoticons 😛 which may not be understood or be able to be coded by software. Some of the more advanced analytics programs have been coded to understand acronyms and emoticons and score them appropriately.
Computers also don’t understand sarcasm which may lead to text being incorrectly scored i.e. “My flight has been delayed, brilliant” is likely to be scored as a positive sentiment based on the word brilliant.
Over half our communication is via body language and plays a very important role in how we interpret communications. Computers can’t read body language which is probably a good thing when you can’t remember your password or the program you are using suddenly crashes but is a limitation when it comes to sentiment analysis.
Feel free to talk to use if you have any questions about sentiment analysis or want to know how we can incorporate it into your engagement and consultation surveys.