Presentation given for the WebSci 2014 conference.
Abstract:
This paper addresses the problem of determining the best answer in Community-based Question Answering websites by focussing on the content. Previous research on this topic relies on the exploitation of community feedback on the an- swers, which involves rating of either users (e.g., reputation) or answers (e.g. scores manually assigned to answers). We propose a new technique that leverages the content/textual features of answers in a novel way. Our approach delivers better results than related linguistics-based solutions and manages to match rating-based approaches. More specifi- cally, the gain in performance is achieved by rendering the values of these features into a discretised form. We also show how our technique manages to deliver equally good results in real-time settings, as opposed to having to rely on infor- mation not always readily available, such as user ratings and answer scores. We ran an evaluation on 21 StackExchange websites covering around 4 million questions and more than 8 million answers. We obtain 84% average precision and 70% recall, which shows that our technique is robust, effective, and widely applicable.
full paper:
http://dl.acm.org/citation.cfm?id=2615569.2615681
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
It’s all in the Content: State of the art Best Answer Prediction based on Discretisation of Shallow Linguistic Features
1. It’s all in the Content: State of the art Best
Answer Prediction based on Discretisation
of Shallow Linguistic Features
George Gkotsis, Karen Stepanyan, Carlos
Pedrinaci, John Domingue, Maria Liakata*
Knowledge Media Institute, The Open University
*Department of Computer Science, University of Warwick
2. Outline
• Motivation
• Problem description
• Proposed solution
• Evaluation
• Discussion & Conclusion
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
6. Why best answer prediction?
• Information overload
• Increase awareness in the community
• Answer questions more efficiently
• One way to study social media reception
• Plus:
• Finding experts in communities
• Study of language use
• Trend analysis
• …
• Visit
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
8. Best answer prediction in Social Q&A
• Binary classification problem
• Is it solved?
• Yes, partially
• Current solutions depend on:
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
Answer Ratings
• Score, #comments
Knowledge is Future & Unknown
User Ratings
• User Reputation
• UpVotes etc
• Preferential attachment
Knowledge is Past & Not
always available
9. State of the art solutions
“…we observe significant assortativity in the reputations of
co-answerers, relationships between reputation and
answer speed, and that the probability of an answer
being chosen as the best one strongly depends on
temporal characteristics of answer arrivals.”
Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, Jure Leskovec
Discovering Value from Community Activity on Focused Question
Answering Sites: A Case Study of Stack Overflow.
KDD 2012
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
10. State of the art solutions (cont.)
“When available, scoring (or rating) features improve
prediction results significantly, which demonstrates the
value of community feedback and reputation for identifying
valuable answers.”
Grégoire Burel, Yulan He, Harith Alani.
Automatic Identification of Best Answers in Online Enquiry
Communities
ESWC 2012
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
11. State of the art solutions
Summary
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
Our solution
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Linguistic User Ratings Answer ratings
Average Precision
12. StackExchange network
SE “is all about getting answers, it’s not a
discussion forum, there’s no chit-chat”
• 123 Q&A sites
• 5,622,330 users
• 9.5 million questions
• 16.3 million answers
• 9.3 million visits per day
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
20 June 2014:
13. Training Dataset
September 2013 dump
StackOverflow & 20 of the most active SE websites
Questions with Accepted Answers
• 4,366,662 Non Accepted Answers
• 3,939,224 Accepted Answers
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
Accepted
Answers
47%
Non
Accepted
Answers…
14. SE websites
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
200,000
Non Accepted
Accepted
15. 23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
StackOverflow
91%
The Rest
9%
3,375,817
3,795,276
0
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
6,000,000
7,000,000
8,000,000
stackoverflow
Non Accepted
Answers
Accepted
Answers
16. Shallow Linguistic features
• Long history, coming from studies on readability
1. Average number of characters per word
2. Average number of words per sentence
3. Number of words in the longest sentence
4. Answer length
5. Log Likehood:
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
Pitler and Nenkova, 2008
24. Shallow features: Observations
• Accepted answers tend to be:
• Longer
• Differ more from the community vocabulary
• Contain shorter words
• Have longer longest sentences
• Have more words per sentence
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
But how good are shallow features?
25. But how good are shallow features?
• 58% macro precision (our baseline)
• Possible reasons
1. Evolution of language characteristics
• Language becomes more eloquent
2. Variance is huge
3. Universal classifier looks unreachable, e.g.:
• SuperUser average length is 577
• Skeptics average length is 2,154
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
27. Objectives
• Build a classifier which is:
1. Based on linguistic features solely
2. Robust
• Performs equally well to other classifiers that use user ratings (past
knowledge) or answer ratings (future knowledge)
3. Universal
• Same classifier applicable to as many SE websites possible
(domain agnostic)
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
28. Feature discretisation
Example for Length
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
Group by question
Question Id
1
5
Answer Id
6
7
Length
2 200
3 150
4 250
150
100
Sort by Length in descending order
Rank
LengthD
1
2
3
1
2
29. Information Gain from Discretisation
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
30. Feature discretisation
Category Name Information Gain
Linguistic
Length 0.0226
LongestSentence 0.0121
LL 0.0053
WordsPerSentence 0.0048
CharactersPerWord 0.0052
Linguistic
Discretisation
LengthD 0.2168
LongestSentenceD 0.1750
LLD 0.1180
WordsPerSentenceD 0.1404
CharactersPerWordD 0.1162
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
20x increase
31. User and answer rating features
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
Category Name Information Gain
Other
Age 0.0539
CreationDateD 0.1575
AnswerCount 0.3270
User Rating
UserReputation 0.0836
UserUpVotes 0.0535
UserDownVotes 0.0412
UserViews 0.0528
UserUpDownVotes 0.0508
Answer rating
Score 0.0792
CommentCount 0.0286
ScoreRatio 0.4539
33. What are we evaluating?
1. Prediction
2. How good is it compared with the SOTA?
3. Generality
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
34. 1. Prediction – Features used
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
Linguistic
Linguistic
Discretisation
Other
User
Rating
Answer
Rating
Past Knowledge Future Knowledge
35. 1. Prediction
• Classifier was Alternate Decision Trees (ADT)
• Binary, boosting, numerical data
• Weka
• 10-fold validation
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
Linguistic
Linguistic
Discretisation
Other
37. 2. Comparison with other solutions
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
Linguistic
Linguistic
Discretisation
Other
User
Rating
Answer
Rating
Case Features Used
1 Linguistic
2 Linguistic & Discretisation
3 Linguistic & Discretisation &
Other
4 Linguistic & Other & User
Rating
(no discretisation)
5 Linguistic & Other & User
Rating
(with discretisation)
6 All features
(Answer and User Rating
with discretisation)
38. Comparison
Case Features Used P R FM AUC
1 Linguistic 0.58 0.60 0.56 0.60
2 Linguistic & Discretisation 0.81 0.70 0.74 0.84
3 Linguistic & Discretisation &
Other
0.84 0.7 0.76 0.87
4 Linguistic & Other & User
Rating
(no discretisation)
0.82 0.69 0.75 0.86
5 Linguistic & Other & User
Rating
(with discretisation)
0.82 0.72 0.77 0.88
6 All features
(Answer and User Rating
with discretisation)
0.88 0.85 0.86 0.94
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
39. 3. Generality
• Leave-one-out
• Trained a classifier for each SE website based on all other SE
websites
(Stackoverflow was evaluated but was excluded from training due to its size)
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
P R FM AUC
Macro average based on self-training
(results from the first part of evaluation) 0.84 0.7 0.76 0.87
Leave-one-out 0.83 0.7 0.76 0.87
41. Best Answer prediction
• Community feedback on the answers remains the best
way for determining the best answer, but
• Discretisation reveals a lot more information
• Content features, even shallow ones CAN be very informative
• Independent from past (not always available) knowledge
• Independent from future knowledge
• Web application/service is under development
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
42. 23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
Best Answer
Prediction
User &
answer rating
Linguistic
features
?
Proposed
solution
43. Thank you
23-26 June 2014 ACM Web Science Conference 2014 (WebSci14)
http://xkcd.com/386/