SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
Industrialize Sentiment Analysis
for Comment Moderation
Maggie Xiong
Huffington Post
Basic Comment Moderation Process

User comments on an article

Moderator publishes or rejects a comment based on a
set of guidelines

“10 commandments”

Comments for different articles come in every second.
We would need a small army to handle the moderation.
The comment should contribute to the discussion, conveying a respectful message, thought 
or idea, whether or not it agrees with another user or the author.
The comment should not intentionally misspell words, use non-alphabetic characters, or use 
extra or missing spaces to bypass moderation.
The comment should not attack, demean, belittle, or stereotype any person or group.
...
JuLiA to the Rescue

Sentiment analysis suite - JuLiA

Supports various preprocessing options

Stemming, stopwords, etc

Includes a number of popular ML algorithms

SVM, naïve Bayes, AdaBoost (decision tree), etc

Uses hadoop for parallelizing the training of different
models and for the exploration of the parameter space

Train 1000's of models with different param setup in parallel

Pick the winner for production

Ensemble the different winners for even higher accuracy
Training Data

Goldset

About 20000 comments (~13000 train, ~7000 holdout)

Publish-or-reject votes from 3 moderators
Christian and Gay? One Politician's Personal Interview (VIDEO)
I'm curious if you have ever watched the film "For The Bible Tells Me So" or if you have
read the book "Torn" by Justin Lee. Bottom line: Biblical interpretation varies. If that's
your interpretation of the scripture then make sure you abide by it.
Rick Santorum On Middle Class: 'That's Marxism Talk,' 'There's No Class In America'
what an angry petty little man he is. issues too. lots of issues he needs to work on. He
certainly has nothing of value to offer or to say. he's a screwed up little prick
Paul Ryan Spending Cuts Face Backlash From Moderate Republicans
You seem to take a negative view of democrats and draw reference to a study "I co-
authored with Robert Book".....sort of like a Muslim professor writing a book on
Christianity your biases disqualify you from offering anything other than a self serving
opinion....now of course I'm just using republican/fox news logic here"
Training Process
73 923 balanced_winnow 5 1 10 …
73 923 balanced_winnow 5 2 10 …
73 923 balanced_winnow 5 3 10 …
73 923 balanced_winnow 5 1 20 …
73 923 balanced_winnow 5 2 20 …
73 923 balanced_winnow 5 3 20 …
…
Train Request (a parameter set per line)
Investments are taxed as capital gains..... 1
It was the overleveraged and underregulated banks … 1
I am afraid we may be headed for … 1
In the famous words of Homer Simpson, “it takes 2 to lie …” 0
…
Training Data
Model 1Model 1
Model 2Model 2
Model 3Model 3
Model 4Model 4
Model 5Model 5
Model kModel k
Hadoop Cluster
Results

Single best model: Naïve Bayes
Results

Model decision on goldset approved comments

Model decision on goldset rejected comments
Pool for Better Results

Logistic regression using multiple model results
Pool for Better Results

Model decision on goldset approved comments

Model decision on goldset rejected comments
Further Steps

Improve the training data set

Data gathered within moderators' normal work flow

More votes per comment

More comments

Per vertical models

Incorporate comment-to-article similarity
In addition to saving his
own life, Zimmerman likely
save a couple other lives
as well.
Thanks!

Conversation and Machine Learning teams

We are hiring!
– maggie.xiong@huffingtonpost.com

Más contenido relacionado

Similar a Industrialize Sentiment Analysis for Comment Moderation

2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.Net2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.NetBruno Capuano
 
Politics Of Usability 09
Politics Of Usability 09Politics Of Usability 09
Politics Of Usability 09Michael Rawlins
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Gabriel Moreira
 
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Dhiana Deva
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsGabriel Moreira
 
Winning At The Politics Of Usability Proposal 18 June 2008
Winning At The Politics Of Usability Proposal 18 June 2008Winning At The Politics Of Usability Proposal 18 June 2008
Winning At The Politics Of Usability Proposal 18 June 2008John Sorflaten, PhD, CUXP
 
Design patterns - The Good, the Bad, and the Anti-Pattern
Design patterns -  The Good, the Bad, and the Anti-PatternDesign patterns -  The Good, the Bad, and the Anti-Pattern
Design patterns - The Good, the Bad, and the Anti-PatternBarry O Sullivan
 
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)Elad Rosenheim
 
2020 01 21 Data Platform Geeks - Machine Learning.Net
2020 01 21 Data Platform Geeks - Machine Learning.Net2020 01 21 Data Platform Geeks - Machine Learning.Net
2020 01 21 Data Platform Geeks - Machine Learning.NetBruno Capuano
 
Machine Learning with Hadoop
Machine Learning with HadoopMachine Learning with Hadoop
Machine Learning with HadoopSangchul Song
 
Keepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech
 
2020 04 04 NetCoreConf - Machine Learning.Net
2020 04 04 NetCoreConf - Machine Learning.Net2020 04 04 NetCoreConf - Machine Learning.Net
2020 04 04 NetCoreConf - Machine Learning.NetBruno Capuano
 
Enterprise 2.0 Adoption Models.
Enterprise 2.0 Adoption Models.Enterprise 2.0 Adoption Models.
Enterprise 2.0 Adoption Models.Kevin Shea
 

Similar a Industrialize Sentiment Analysis for Comment Moderation (14)

Modelling Heuristics
Modelling HeuristicsModelling Heuristics
Modelling Heuristics
 
2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.Net2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.Net
 
Politics Of Usability 09
Politics Of Usability 09Politics Of Usability 09
Politics Of Usability 09
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
 
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
 
Winning At The Politics Of Usability Proposal 18 June 2008
Winning At The Politics Of Usability Proposal 18 June 2008Winning At The Politics Of Usability Proposal 18 June 2008
Winning At The Politics Of Usability Proposal 18 June 2008
 
Design patterns - The Good, the Bad, and the Anti-Pattern
Design patterns -  The Good, the Bad, and the Anti-PatternDesign patterns -  The Good, the Bad, and the Anti-Pattern
Design patterns - The Good, the Bad, and the Anti-Pattern
 
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
 
2020 01 21 Data Platform Geeks - Machine Learning.Net
2020 01 21 Data Platform Geeks - Machine Learning.Net2020 01 21 Data Platform Geeks - Machine Learning.Net
2020 01 21 Data Platform Geeks - Machine Learning.Net
 
Machine Learning with Hadoop
Machine Learning with HadoopMachine Learning with Hadoop
Machine Learning with Hadoop
 
Keepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivos
 
2020 04 04 NetCoreConf - Machine Learning.Net
2020 04 04 NetCoreConf - Machine Learning.Net2020 04 04 NetCoreConf - Machine Learning.Net
2020 04 04 NetCoreConf - Machine Learning.Net
 
Enterprise 2.0 Adoption Models.
Enterprise 2.0 Adoption Models.Enterprise 2.0 Adoption Models.
Enterprise 2.0 Adoption Models.
 

Último

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Industrialize Sentiment Analysis for Comment Moderation

  • 1. Industrialize Sentiment Analysis for Comment Moderation Maggie Xiong Huffington Post
  • 2.
  • 3. Basic Comment Moderation Process  User comments on an article  Moderator publishes or rejects a comment based on a set of guidelines  “10 commandments”  Comments for different articles come in every second. We would need a small army to handle the moderation. The comment should contribute to the discussion, conveying a respectful message, thought  or idea, whether or not it agrees with another user or the author. The comment should not intentionally misspell words, use non-alphabetic characters, or use  extra or missing spaces to bypass moderation. The comment should not attack, demean, belittle, or stereotype any person or group. ...
  • 4. JuLiA to the Rescue  Sentiment analysis suite - JuLiA  Supports various preprocessing options  Stemming, stopwords, etc  Includes a number of popular ML algorithms  SVM, naïve Bayes, AdaBoost (decision tree), etc  Uses hadoop for parallelizing the training of different models and for the exploration of the parameter space  Train 1000's of models with different param setup in parallel  Pick the winner for production  Ensemble the different winners for even higher accuracy
  • 5. Training Data  Goldset  About 20000 comments (~13000 train, ~7000 holdout)  Publish-or-reject votes from 3 moderators Christian and Gay? One Politician's Personal Interview (VIDEO) I'm curious if you have ever watched the film "For The Bible Tells Me So" or if you have read the book "Torn" by Justin Lee. Bottom line: Biblical interpretation varies. If that's your interpretation of the scripture then make sure you abide by it. Rick Santorum On Middle Class: 'That's Marxism Talk,' 'There's No Class In America' what an angry petty little man he is. issues too. lots of issues he needs to work on. He certainly has nothing of value to offer or to say. he's a screwed up little prick Paul Ryan Spending Cuts Face Backlash From Moderate Republicans You seem to take a negative view of democrats and draw reference to a study "I co- authored with Robert Book".....sort of like a Muslim professor writing a book on Christianity your biases disqualify you from offering anything other than a self serving opinion....now of course I'm just using republican/fox news logic here"
  • 6. Training Process 73 923 balanced_winnow 5 1 10 … 73 923 balanced_winnow 5 2 10 … 73 923 balanced_winnow 5 3 10 … 73 923 balanced_winnow 5 1 20 … 73 923 balanced_winnow 5 2 20 … 73 923 balanced_winnow 5 3 20 … … Train Request (a parameter set per line) Investments are taxed as capital gains..... 1 It was the overleveraged and underregulated banks … 1 I am afraid we may be headed for … 1 In the famous words of Homer Simpson, “it takes 2 to lie …” 0 … Training Data Model 1Model 1 Model 2Model 2 Model 3Model 3 Model 4Model 4 Model 5Model 5 Model kModel k Hadoop Cluster
  • 8. Results  Model decision on goldset approved comments  Model decision on goldset rejected comments
  • 9. Pool for Better Results  Logistic regression using multiple model results
  • 10. Pool for Better Results  Model decision on goldset approved comments  Model decision on goldset rejected comments
  • 11. Further Steps  Improve the training data set  Data gathered within moderators' normal work flow  More votes per comment  More comments  Per vertical models  Incorporate comment-to-article similarity
  • 12. In addition to saving his own life, Zimmerman likely save a couple other lives as well.
  • 13. Thanks!  Conversation and Machine Learning teams  We are hiring! – maggie.xiong@huffingtonpost.com