Are we really in control of our decisions? If we are in a supermarket in front of two equally cheap products. Which of them do we choose? Usually we choose the product that gives us better feeling. But those sensations are built with advertising campaigns that can hide bad practices with workers, clients and governments.
Aggregating opinions is a very powerful idea because it allows people to make more informed decisions about what they do and change a little bit the world with those small decisions.
Read the thesis here >> http://bit.ly/1nlyE68
3. What is sentiment analysis?
[Liu, 2010] Proposes a quintuple (oj, fjk, ooijkl, hi, tj). Text
unstructured data to structured data.
oj: Object
fjk: Object features (Aspect)
ooijkl: Opinion orientations (positive/negative),
(calm/anger/joy/happiness), intensity, ...
hi: Opinion holder
tj: Time frame
6. State-of-the-art
- Twitter as a corpus [Pak and Paroubek, 2010]: Text-
classification problem. Features for machine learning
techniques.
- Emoticons :)
- N-grams
- Negations
- Pos-tagging
- Syntax
- Twitter specific features.
7. State-of-the-art
- Pointwise Mutual Information [Su and Xiang, 2006]: We can
have the probability of certain words in a phrase of being
positive or negative depending on their co-occurrences in the
WWW.
8. State-of-the-art
- Sentiment dictionaries: Sentiwordnet [Baccianella and Esuli,
2010]. Positive score and Negative score for each meaning
(#N). Calculated with Random-walk algorithm.
11. Not developed in state-of-the-art
Structured N-grams.
Most of the work is done with N-grams.
Buzz detection.
Aspect identification is not a main focus.
14. Hypothesis
H1: We can create groups of N-grams that influence specifically
to one aspect in a negative or a positive orientation. This is what
we call sentigrams.
H2: By using incremental learning the system improves in
each iteration. User interaction increases precision.
H3: After certain number of iterations is reached we can assign
sentigrams to a tweet automatically.
15. Hypothesis (H1) - Sentigrams
We define as sentigram the relation between sentiwords and
aspects that define if a tweet is postive or negative.
- Sentigram is an evolution from N-grams. Which could be
considered as structured N-gram.
- Detect aspects and sentiwords inside a text.
16. Hypothesis (H1) - Sentigrams
- Mark opinion orientations. Not only if they are positive or
negative, also which aspect are they referring to.
17. Hypothesis (H2) - Incremental learning
By using incremental learning the system improves in each
iteration. Increasing precision.
- Original sentiwordnet version was not very adapted to our
domain.
- We include new sentiwords from annotations in our dictionary
with scores (pos_score: 0, neg_score: 0).
- Random-walk update word scores until accuracy converges.
18. Hypothesis (H3) - Automatization
After certain number of iterations is reached we can assign
sentigrams to a tweet automatically without manual
intervention.
- Multi class problem!! Each tweet has several words to guess.
Text-classification problem!!
19. Hypothesis (H3) - ML
- Convert a multiclass problem in a binary problem
(i.e. "ryanair is a joke").
0,801829636,-
545403680,1561023766,2119008529,11,801829636,-
545403680,1561023766,2119008529,0
2,801829636,-545403680,1561023766,2119008529,0
3,801829636,-545403680,1561023766,2119008529,2
- Focus the problem by position: (0..N). N partial observations
from each tweet.
- Numerical codes for words. Three classes available {0,1,2}
20. Hypothesis (H3) - Dependency parsing
- Mate Tools
1 ryanair _ ryanair _ NN _ _ -1 2 _ SBJ _ _
2 is _ be _ VBZ _ _ -1 0 _ ROOT _ _
3 a _ a _ DT _ _ -1 4 _ NMOD _ _
4 joke _ joke _ NN _ _ -1 2 _ PRD _ _
- Still noisy. Work in progress.
- ML approach: Accuracy is 85% against our gold standard.
Focusing only on aspects we can get 94% accuracy.
21. Conclusions
- Sentiwordnet version was not very adapted to our domain.
Accuracy 47%. Random-walk necessary.
- Design of interface to perform interactive annotations. Semi-
supervised approach.
- With words from annotations pos scores and neg scores are
changed randomly until accuracy is optimized. Convergence
reached. Accuracy 89%.
22. Conclusions
- Focus on aspect identification. Not only +/-. We detect what
the user is complaining about.
- Convert a multi class problem in a binary problem. Divide &
conquer!!
- Machine-learning & dependency parsing of tweets to detect
patterns. Accuracy 85%
23. What's next?
- Finish integration with dependency parsing.
- Data visualization. Comparison between several topics.
Positive aspects and negative aspects of each topic.
- Train the system for several domains: airlines, politics, tv,
telecommunications, etc...
I want to start this presentation with a little bit of thinking. I want you to read this quote and think about it for a few seconds. Is this really true? If for instance we are in a supermarket and we have to choose between two products with similar prices. Normally we buy from the brand that gives us better feeling. And this feeling is connected with its advertising campaign and its power to create this good feeling. But is this good feeling real? Behind a nice and inspiring ad it could be thousands of reasons equally important to not buy this product. Other values such as how well this company interact with its workers, how well this company interacts with its clients or how many non-resolved reclamations they have. SA can give us access to this information. The aggregation of opinions is a way of giving people the power of taking more informed decisions. Because they can analyze which kind of opinions other users have about a brand and if it is worth it to buy from them. I see also SA as a way of creating real change. If we buy from brands with better social values, we will be able to evolve to a better society.
Bing Liu does a very good definition of SA. He defines this as a quintuple where we have 5 fields. An object or main topic, the different object aspects which the opinion is referring to. A set of different opinion orientations that could be positive or negative and with a determined degree of intensity. And finally we can have an opinion holder and a specific time.
So we can see some examples of this quintuples here. We could have an opinion about easyjet that considers that the baggage is too expensive. Another one about a house renting company that says that they are horrible people, and maybe some positive opinions here we have one about jazztel that says that there are no problems. We can see how each opinion is defined with a different degree of intesnisty.
But what we can do with those tuples of information. What if we aggregate all of them in one place? What if we have one place where in seconds we can know how a brand treats its clients? This idea is very powerful because it will be a way to force companies to be more human and respond to certain values if they want to survive. Informed citizens are smart citizens.
But what we do when texts are from different domains. The negative words in the domain of airlines are not the same that in the domain of politics. Can we build cross-domain solutions? Pan proposed a solution for that, basically dividing in two groups of words. On the left we have domain-independent words and in the right domain-dependent words. As you will see this organization of information creates little groups such as never_buy with blurry and boring. With a system like that we can detect new domain-dependent opinion words by checking its co-ocurrence with words on the left side.
One of the main techniques is Pointwise Mutual Information. This method consists in using the World Wide Web as a database. Basically if we want to query if a "phrase" is positive or negative we have to take first N results in a search engine of this "phrase" and calculate how many co-occurrences we have in positive contexts and how many in negative contexts. Depending on that we can guess the orientation of this "phrase".
Other state-if -the art technique are sentiment dictionary. this consists in databases with words where each word has a positive and a negative score. We can use this information in our programs to build a sentiment score for any text. One of the main is Sentiwordnet that as you will see it has a pos score, a neg score, also a little gloss to understand, and the specific words affected for this meaning. Obviously we can have the same word in different meanings. This is way we use this hashtag and number at the end of each word.
But what we do when texts are from different domains. The negative words in the domain of airlines are not the same that in the domain of politics. Can we build cross-domain solutions? Pan proposed a solution for that, basically dividing in two groups of words. On the left we have domain-independent words and in the right domain-dependent words. As you will see this organization of information creates little groups such as never_buy with blurry and boring. With a system like that we can detect new domain-dependent opinion words by checking its co-ocurrence with words on the left side.
And yes, SA has been used to predict. In 2008 was used for the Obama's election process, it has been used also in Germany and also to predcit stock market. Is possible to find indicators that anticipate the tendencies seen in polls. So twitter allows us to see the tendencies in real-time. Sometimes to find this tendencies we need to work in other dimensionality sentiment spaces different from positive/negative such as calm/anger/joy/happiness/....
This slides are to explain what is not very developed yet in state-of-the-art of SA. Basically we saw that N-grams are very exploited to detect opinions, but there is not exploitation of combinations of N-grams as new units. Finding correlations between similar N-grams is a very interesting line of investigation. Yes, another thing not seen. Is treat opinions as a problem. So what if we want to read a newspaper without the writer opinion? What if we only want to read the facts, the data? SA has not been very exploited to remove opinions from texts. Which I think in some cases would be interesting
After reviewing the material we chose this architecture. Basically we donwload some tweets from the Twitter Api (about any topic that interest us), we merge this information with a dictionary through Hadoop so we get an score for each tweet depending on how many words of the dictionary are inside each tweet. And finally we cans how this information in a Rails interface and compare different topics, create statistics, and so on. But at the same time, we can improve the system performing annotations on tweets, and little corrections. Those corrections are reused to create new words in the dictionary and improving this tweet score. And at the same time, this annotations can be used to create Weka models that can help to create this statistics that we want to show in the interface.
We choose Ruby because its simplicity. We do not need to compile. We do not need to deploy. Maintenance is simple. And at the same time we can use Java when needed with Jruby or Hadoop Streaming library. Hadoop allows us to perform this agroupation between tweets and dictionary without wasting memory. So all this "GROUP BY" can be done in disk (writing sequentally). In a iterative version we would need to save on memory all tweets and dictionary and check them there. What if we have 10 millions of tweets, will fit in memory?
We work with three hypothesis here. 1/ That is possible to create groups of N-grams called sentigrams. Groups that indicate if a tweet is positive or negative and that refer to a specific aspect. 2/ That the system allows to do incremental learning and improve this tweet score in each iteration. 3/ That we can learn sentigrams as the number of interations increases and at certain point we will be able to dectect if the tweet is positive or negative and why.
Read text. As we can see there we mark the aspects in black and the sentiwords in red. So we have that ryanair is a nightmare, and that is ridiculous to pay extra for baggage. Those two sentigrams will tell to us that the message is basically negative.
After that we have to mark opinion orientation independently of the score given by our system (that could be wrong). So here we have a postive message that says that this two airlines are always on time. So we mark as good. And in the negative message we mark as negative.
The second hypothesis is about the idea of "incremental learning". That was needed because original dictionary had an accuracy below 50%. To fix that we can use random-walk algorithm to rebalance the scores of the words.
Third hypothesis. Automatization of sentigram detection. As we will see this is a multiclass problem, because we have to choose between several strings. Working with text is not like working with numbers, is different.
To solve this problem we transform this multiclass problem in a binary problem. So we ceate 4 partial observations each one in a different position of the text. First, second, third and fourth. We transform words in numbers through hash codes. And then we determine if a word is an aspect, a sentiword, or it is not relevant by adding three codes (0,1,2). This idea is similar to Viterbi algorithm that works with partial observations to guess next state.
We are currently investigating other techniques such as dependency parsing. So we want to see if providing a surface structure can help to classificate those sentigrams. We are still working on it. So basically the ML approach is giving us better results (85%), (94% if we focus individually in aspects or sentiwords)
That the original dictionary was useless and we needed to perform random-walk. So we designed a screen to perform interactive corrections.
And in this third iteration that is not finished yet we are working in sentigram identification through machine learning and dependency parsing. Our accuracy right now is 85%.