3. Basic Comment Moderation Process
User comments on an article
Moderator publishes or rejects a comment based on a
set of guidelines
“10 commandments”
Comments for different articles come in every second.
We would need a small army to handle the moderation.
The comment should contribute to the discussion, conveying a respectful message, thought
or idea, whether or not it agrees with another user or the author.
The comment should not intentionally misspell words, use non-alphabetic characters, or use
extra or missing spaces to bypass moderation.
The comment should not attack, demean, belittle, or stereotype any person or group.
...
4. JuLiA to the Rescue
Sentiment analysis suite - JuLiA
Supports various preprocessing options
Stemming, stopwords, etc
Includes a number of popular ML algorithms
SVM, naïve Bayes, AdaBoost (decision tree), etc
Uses hadoop for parallelizing the training of different
models and for the exploration of the parameter space
Train 1000's of models with different param setup in parallel
Pick the winner for production
Ensemble the different winners for even higher accuracy
5. Training Data
Goldset
About 20000 comments (~13000 train, ~7000 holdout)
Publish-or-reject votes from 3 moderators
Christian and Gay? One Politician's Personal Interview (VIDEO)
I'm curious if you have ever watched the film "For The Bible Tells Me So" or if you have
read the book "Torn" by Justin Lee. Bottom line: Biblical interpretation varies. If that's
your interpretation of the scripture then make sure you abide by it.
Rick Santorum On Middle Class: 'That's Marxism Talk,' 'There's No Class In America'
what an angry petty little man he is. issues too. lots of issues he needs to work on. He
certainly has nothing of value to offer or to say. he's a screwed up little prick
Paul Ryan Spending Cuts Face Backlash From Moderate Republicans
You seem to take a negative view of democrats and draw reference to a study "I co-
authored with Robert Book".....sort of like a Muslim professor writing a book on
Christianity your biases disqualify you from offering anything other than a self serving
opinion....now of course I'm just using republican/fox news logic here"
6. Training Process
73 923 balanced_winnow 5 1 10 …
73 923 balanced_winnow 5 2 10 …
73 923 balanced_winnow 5 3 10 …
73 923 balanced_winnow 5 1 20 …
73 923 balanced_winnow 5 2 20 …
73 923 balanced_winnow 5 3 20 …
…
Train Request (a parameter set per line)
Investments are taxed as capital gains..... 1
It was the overleveraged and underregulated banks … 1
I am afraid we may be headed for … 1
In the famous words of Homer Simpson, “it takes 2 to lie …” 0
…
Training Data
Model 1Model 1
Model 2Model 2
Model 3Model 3
Model 4Model 4
Model 5Model 5
Model kModel k
Hadoop Cluster
9. Pool for Better Results
Logistic regression using multiple model results
10. Pool for Better Results
Model decision on goldset approved comments
Model decision on goldset rejected comments
11. Further Steps
Improve the training data set
Data gathered within moderators' normal work flow
More votes per comment
More comments
Per vertical models
Incorporate comment-to-article similarity
12. In addition to saving his
own life, Zimmerman likely
save a couple other lives
as well.