SlideShare una empresa de Scribd logo
1 de 14
Language Intelligence
Why Sentiment Analysis is a Market for
Lemons … and How to Fix it
Robert Munro
With thanks!
Gary King & Jana Thompson:
<- other Idibon people here:
Michelle Casbon & Nick Gaylord
What is a market for lemons?
• Information asymmetry between
buyers and sellers, leaving only
"lemons" behind. George Akerlof
• Buyers cannot distinguish good
from bad products
• Prices are equally low for all
products
• The buyer's price adverse
selection problem drives the
high-quality products from the
market
Competition is not increasing accuracy
• 100+ companies
offering some
form of sentiment
analysis
• Accuracy hovering
around 70% for
real-world
applications for
almost a decade
The most honest sentiment analysis results you will
see
Accuracy F-Score Recall Precision F-Score
Positive Negative Neutral Positive Negative Neutral Positive Negative Neutral
Semantria 0.59 0.59 0.56 0.47 0.78 0.68 0.80 0.45 0.62 0.59 0.57
MonkeyLearn 0.50 0.38* 0.84 0.54 0.00 0.45 0.60 0.00 0.59 0.57 0.00
MetaMind 0.66 0.66 0.68 0.46 0.88 0.78 0.88 0.50 0.73 0.60 0.64
Idibon Public 0.68 0.67 0.76 0.75 0.49 0.66 0.69 0.72 0.71 0.72 0.58
• Even within the best results for one domain, there is no clear
leader when broken down by category
• All systems could have best results in other domains
• All could adapt here: Monkey Learn had errors with the ‘Neutral’
category, but we are sure they could update their models
Source: Sentiment 140 corpus, 3-way sentiment on social data:
http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip
Data beats algorithms; feedback beats data
0.457 0.473
0.615
0.948
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Linear model Deep Learning In-domain
training
10mins analyst
feedback
precision
recall
F-value
Distinguishing the correct ‘Ford’
Distinguishing “Ford” the company from people called “Ford”
Consumers are
uncertain
• When consumers try out-
of-domain analysis, they
lose confidence from the
poor results.
• Domain-dependence
means that even bad
models will be accurate
in some areas
• Consumers can only
evaluate anecdotally or
by precision, not recall
• Uncertainty prevails
Market forces are not breeding innovation
• Can’t innovate
through code alone
• More training data!
• But low price-points
means low margins
• Lack of capital to
find & label enough
training data
The Solution
• A different economic
models for useful
sentiment analysis:
• Data-sharing for more
accurate training data
• Protecting sensitive data
from public release
Machine
learning
Optimization
Human
annotation
Cloud
prediction
engine
Actionable
intelligence
On-site
prediction
engine
Copy & Sync Models
App Requests
Ambiguous, Novel & Interesting Items
Internal Data Flow
Hybrid Model Data Flow
Application Data Flow
firewall
The Benefits
• Multiple organizations can share in the benefits of better
sentiment analysis, without sacrificing privacy
• Single point of human-contact: no expensive duplicate
manual labeling of data
• Keeps lemons out of the market
Idibon Public: our implementation
• Free product, offered in addition to our enterprise
Idibon Studio and Idibon Terminal solutions
Applies to NLP and Machine
Learning more broadly
Every human communication
• Any task can be bundled this way
• Allows margins for use cases that
were not otherwise viable
• … including the full diversity of
languages, priced out when
everyone started in English
Language Intelligence
Why Sentiment Analysis is a Market for
Lemons … and How to Fix it
QUESTIONS?
Robert Munro

Más contenido relacionado

Destacado

Realtime crowdsourced translation for emergency response and beyond
Realtime crowdsourced translation for emergency response and beyondRealtime crowdsourced translation for emergency response and beyond
Realtime crowdsourced translation for emergency response and beyondRobert Munro
 
Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...Robert Munro
 
Crowdsourcing and Natural Language Processing for Humanitarian Response
Crowdsourcing and Natural Language Processing for Humanitarian Response Crowdsourcing and Natural Language Processing for Humanitarian Response
Crowdsourcing and Natural Language Processing for Humanitarian Response Robert Munro
 
Bringing Data Science to the Speakers of Every Language
Bringing Data Science to the Speakers of Every Language Bringing Data Science to the Speakers of Every Language
Bringing Data Science to the Speakers of Every Language Robert Munro
 
Understanding Community Needs: Scalable SMS Processing for UNICEF Nigeria and...
Understanding Community Needs: Scalable SMS Processing for UNICEF Nigeria and...Understanding Community Needs: Scalable SMS Processing for UNICEF Nigeria and...
Understanding Community Needs: Scalable SMS Processing for UNICEF Nigeria and...Idibon1
 
Talking to the crowd in 7,000 languages
Talking to the crowd in 7,000 languages �Talking to the crowd in 7,000 languages �
Talking to the crowd in 7,000 languages Robert Munro
 
Processing short-message communications in low-resource languages
Processing short-message communications in low-resource languages�Processing short-message communications in low-resource languages�
Processing short-message communications in low-resource languages Robert Munro
 
Energy for Opportunity, Presentation for E-Discuss
Energy for Opportunity, Presentation for E-DiscussEnergy for Opportunity, Presentation for E-Discuss
Energy for Opportunity, Presentation for E-DiscussRobert Munro
 
Tracking Epidemics with Natural Language Processing and Crowdsourcing
Tracking Epidemics with Natural Language Processing and Crowdsourcing�Tracking Epidemics with Natural Language Processing and Crowdsourcing�
Tracking Epidemics with Natural Language Processing and CrowdsourcingRobert Munro
 

Destacado (10)

Realtime crowdsourced translation for emergency response and beyond
Realtime crowdsourced translation for emergency response and beyondRealtime crowdsourced translation for emergency response and beyond
Realtime crowdsourced translation for emergency response and beyond
 
Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...
 
Crowdsourcing and Natural Language Processing for Humanitarian Response
Crowdsourcing and Natural Language Processing for Humanitarian Response Crowdsourcing and Natural Language Processing for Humanitarian Response
Crowdsourcing and Natural Language Processing for Humanitarian Response
 
Bringing Data Science to the Speakers of Every Language
Bringing Data Science to the Speakers of Every Language Bringing Data Science to the Speakers of Every Language
Bringing Data Science to the Speakers of Every Language
 
Understanding Community Needs: Scalable SMS Processing for UNICEF Nigeria and...
Understanding Community Needs: Scalable SMS Processing for UNICEF Nigeria and...Understanding Community Needs: Scalable SMS Processing for UNICEF Nigeria and...
Understanding Community Needs: Scalable SMS Processing for UNICEF Nigeria and...
 
Talking to the crowd in 7,000 languages
Talking to the crowd in 7,000 languages �Talking to the crowd in 7,000 languages �
Talking to the crowd in 7,000 languages
 
Processing short-message communications in low-resource languages
Processing short-message communications in low-resource languages�Processing short-message communications in low-resource languages�
Processing short-message communications in low-resource languages
 
Energy for Opportunity, Presentation for E-Discuss
Energy for Opportunity, Presentation for E-DiscussEnergy for Opportunity, Presentation for E-Discuss
Energy for Opportunity, Presentation for E-Discuss
 
Crowdring
CrowdringCrowdring
Crowdring
 
Tracking Epidemics with Natural Language Processing and Crowdsourcing
Tracking Epidemics with Natural Language Processing and Crowdsourcing�Tracking Epidemics with Natural Language Processing and Crowdsourcing�
Tracking Epidemics with Natural Language Processing and Crowdsourcing
 

Similar a Why Sentiment Analysis is a Market for Lemons … and How to Fix it

Do's and Don'ts of Data Driven Marketing
Do's and Don'ts of Data Driven MarketingDo's and Don'ts of Data Driven Marketing
Do's and Don'ts of Data Driven MarketingSparkPost
 
Search Engine Results: The Best Measure?
Search Engine Results: The Best Measure? Search Engine Results: The Best Measure?
Search Engine Results: The Best Measure? Fan Foundry
 
A change manifesto for the CIO: A business perspective by Hugh Terry
A change manifesto for the CIO: A business perspective by Hugh Terry A change manifesto for the CIO: A business perspective by Hugh Terry
A change manifesto for the CIO: A business perspective by Hugh Terry The Digital Insurer
 
The future of Data Management
The future of Data ManagementThe future of Data Management
The future of Data ManagementThom. Poole
 
Andrew Grant
Andrew GrantAndrew Grant
Andrew GrantMRS
 
Winning the New Digital Consumer with Hyper-Relevance
Winning the New Digital Consumer with Hyper-RelevanceWinning the New Digital Consumer with Hyper-Relevance
Winning the New Digital Consumer with Hyper-RelevanceJoseph M Bradley
 
10 Ways to Leverage the Power of Marketing Automation
10 Ways to Leverage the Power of Marketing Automation10 Ways to Leverage the Power of Marketing Automation
10 Ways to Leverage the Power of Marketing Automationedynamic
 
From data to business intelligence
From data to business intelligenceFrom data to business intelligence
From data to business intelligencenjaffe
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 
Information transparency, michael paulson, sic, 2011 11 03
Information transparency, michael paulson, sic, 2011 11 03Information transparency, michael paulson, sic, 2011 11 03
Information transparency, michael paulson, sic, 2011 11 03Michael Paulson
 
Your CRM is a DEAD-END!
Your CRM is a DEAD-END!Your CRM is a DEAD-END!
Your CRM is a DEAD-END!Tenbound
 
Media vs Data: Why the Double Standard?
Media vs Data: Why the Double Standard?Media vs Data: Why the Double Standard?
Media vs Data: Why the Double Standard?MediaPost
 
Does big data = big insights?
Does big data = big insights?Does big data = big insights?
Does big data = big insights?Colin Strong
 
eCommerce expo- Jasper Bell
eCommerce expo- Jasper BelleCommerce expo- Jasper Bell
eCommerce expo- Jasper Bellamazeplc
 
Artificial Intelligence: How Enterprises Can Crush It With Apache Spark: Keyn...
Artificial Intelligence: How Enterprises Can Crush It With Apache Spark: Keyn...Artificial Intelligence: How Enterprises Can Crush It With Apache Spark: Keyn...
Artificial Intelligence: How Enterprises Can Crush It With Apache Spark: Keyn...Spark Summit
 
7 Deadly Sins of Programmatic | Gil Snir, Benchmarketing | GC Brisbane 2016
7 Deadly Sins of Programmatic | Gil Snir, Benchmarketing | GC Brisbane 20167 Deadly Sins of Programmatic | Gil Snir, Benchmarketing | GC Brisbane 2016
7 Deadly Sins of Programmatic | Gil Snir, Benchmarketing | GC Brisbane 2016Bench
 
Tyler Garns - Automation That Converts
Tyler Garns - Automation That ConvertsTyler Garns - Automation That Converts
Tyler Garns - Automation That ConvertsInfusionsoft
 
"Ready or Not, Here Comes 2015: Marketing Trends to Master" TrendLab Webinar
"Ready or Not, Here Comes 2015: Marketing Trends to Master" TrendLab Webinar"Ready or Not, Here Comes 2015: Marketing Trends to Master" TrendLab Webinar
"Ready or Not, Here Comes 2015: Marketing Trends to Master" TrendLab WebinarBluespire Marketing
 
Triangle AMA Marketing Workshop
Triangle AMA Marketing WorkshopTriangle AMA Marketing Workshop
Triangle AMA Marketing WorkshopJon Barlow
 

Similar a Why Sentiment Analysis is a Market for Lemons … and How to Fix it (20)

Do's and Don'ts of Data Driven Marketing
Do's and Don'ts of Data Driven MarketingDo's and Don'ts of Data Driven Marketing
Do's and Don'ts of Data Driven Marketing
 
Search Engine Results: The Best Measure?
Search Engine Results: The Best Measure? Search Engine Results: The Best Measure?
Search Engine Results: The Best Measure?
 
A change manifesto for the CIO: A business perspective by Hugh Terry
A change manifesto for the CIO: A business perspective by Hugh Terry A change manifesto for the CIO: A business perspective by Hugh Terry
A change manifesto for the CIO: A business perspective by Hugh Terry
 
The future of Data Management
The future of Data ManagementThe future of Data Management
The future of Data Management
 
Andrew Grant
Andrew GrantAndrew Grant
Andrew Grant
 
Winning the New Digital Consumer with Hyper-Relevance
Winning the New Digital Consumer with Hyper-RelevanceWinning the New Digital Consumer with Hyper-Relevance
Winning the New Digital Consumer with Hyper-Relevance
 
10 Ways to Leverage the Power of Marketing Automation
10 Ways to Leverage the Power of Marketing Automation10 Ways to Leverage the Power of Marketing Automation
10 Ways to Leverage the Power of Marketing Automation
 
From data to business intelligence
From data to business intelligenceFrom data to business intelligence
From data to business intelligence
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 
Information transparency, michael paulson, sic, 2011 11 03
Information transparency, michael paulson, sic, 2011 11 03Information transparency, michael paulson, sic, 2011 11 03
Information transparency, michael paulson, sic, 2011 11 03
 
Michael paulson-sic-2011
Michael paulson-sic-2011Michael paulson-sic-2011
Michael paulson-sic-2011
 
Your CRM is a DEAD-END!
Your CRM is a DEAD-END!Your CRM is a DEAD-END!
Your CRM is a DEAD-END!
 
Media vs Data: Why the Double Standard?
Media vs Data: Why the Double Standard?Media vs Data: Why the Double Standard?
Media vs Data: Why the Double Standard?
 
Does big data = big insights?
Does big data = big insights?Does big data = big insights?
Does big data = big insights?
 
eCommerce expo- Jasper Bell
eCommerce expo- Jasper BelleCommerce expo- Jasper Bell
eCommerce expo- Jasper Bell
 
Artificial Intelligence: How Enterprises Can Crush It With Apache Spark: Keyn...
Artificial Intelligence: How Enterprises Can Crush It With Apache Spark: Keyn...Artificial Intelligence: How Enterprises Can Crush It With Apache Spark: Keyn...
Artificial Intelligence: How Enterprises Can Crush It With Apache Spark: Keyn...
 
7 Deadly Sins of Programmatic | Gil Snir, Benchmarketing | GC Brisbane 2016
7 Deadly Sins of Programmatic | Gil Snir, Benchmarketing | GC Brisbane 20167 Deadly Sins of Programmatic | Gil Snir, Benchmarketing | GC Brisbane 2016
7 Deadly Sins of Programmatic | Gil Snir, Benchmarketing | GC Brisbane 2016
 
Tyler Garns - Automation That Converts
Tyler Garns - Automation That ConvertsTyler Garns - Automation That Converts
Tyler Garns - Automation That Converts
 
"Ready or Not, Here Comes 2015: Marketing Trends to Master" TrendLab Webinar
"Ready or Not, Here Comes 2015: Marketing Trends to Master" TrendLab Webinar"Ready or Not, Here Comes 2015: Marketing Trends to Master" TrendLab Webinar
"Ready or Not, Here Comes 2015: Marketing Trends to Master" TrendLab Webinar
 
Triangle AMA Marketing Workshop
Triangle AMA Marketing WorkshopTriangle AMA Marketing Workshop
Triangle AMA Marketing Workshop
 

Último

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 

Último (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Why Sentiment Analysis is a Market for Lemons … and How to Fix it

  • 1. Language Intelligence Why Sentiment Analysis is a Market for Lemons … and How to Fix it Robert Munro
  • 2. With thanks! Gary King & Jana Thompson: <- other Idibon people here: Michelle Casbon & Nick Gaylord
  • 3. What is a market for lemons? • Information asymmetry between buyers and sellers, leaving only "lemons" behind. George Akerlof • Buyers cannot distinguish good from bad products • Prices are equally low for all products • The buyer's price adverse selection problem drives the high-quality products from the market
  • 4. Competition is not increasing accuracy • 100+ companies offering some form of sentiment analysis • Accuracy hovering around 70% for real-world applications for almost a decade
  • 5. The most honest sentiment analysis results you will see Accuracy F-Score Recall Precision F-Score Positive Negative Neutral Positive Negative Neutral Positive Negative Neutral Semantria 0.59 0.59 0.56 0.47 0.78 0.68 0.80 0.45 0.62 0.59 0.57 MonkeyLearn 0.50 0.38* 0.84 0.54 0.00 0.45 0.60 0.00 0.59 0.57 0.00 MetaMind 0.66 0.66 0.68 0.46 0.88 0.78 0.88 0.50 0.73 0.60 0.64 Idibon Public 0.68 0.67 0.76 0.75 0.49 0.66 0.69 0.72 0.71 0.72 0.58 • Even within the best results for one domain, there is no clear leader when broken down by category • All systems could have best results in other domains • All could adapt here: Monkey Learn had errors with the ‘Neutral’ category, but we are sure they could update their models Source: Sentiment 140 corpus, 3-way sentiment on social data: http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip
  • 6. Data beats algorithms; feedback beats data 0.457 0.473 0.615 0.948 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Linear model Deep Learning In-domain training 10mins analyst feedback precision recall F-value Distinguishing the correct ‘Ford’ Distinguishing “Ford” the company from people called “Ford”
  • 7. Consumers are uncertain • When consumers try out- of-domain analysis, they lose confidence from the poor results. • Domain-dependence means that even bad models will be accurate in some areas • Consumers can only evaluate anecdotally or by precision, not recall • Uncertainty prevails
  • 8. Market forces are not breeding innovation • Can’t innovate through code alone • More training data! • But low price-points means low margins • Lack of capital to find & label enough training data
  • 9. The Solution • A different economic models for useful sentiment analysis: • Data-sharing for more accurate training data • Protecting sensitive data from public release
  • 10. Machine learning Optimization Human annotation Cloud prediction engine Actionable intelligence On-site prediction engine Copy & Sync Models App Requests Ambiguous, Novel & Interesting Items Internal Data Flow Hybrid Model Data Flow Application Data Flow firewall
  • 11. The Benefits • Multiple organizations can share in the benefits of better sentiment analysis, without sacrificing privacy • Single point of human-contact: no expensive duplicate manual labeling of data • Keeps lemons out of the market
  • 12. Idibon Public: our implementation • Free product, offered in addition to our enterprise Idibon Studio and Idibon Terminal solutions
  • 13. Applies to NLP and Machine Learning more broadly Every human communication • Any task can be bundled this way • Allows margins for use cases that were not otherwise viable • … including the full diversity of languages, priced out when everyone started in English
  • 14. Language Intelligence Why Sentiment Analysis is a Market for Lemons … and How to Fix it QUESTIONS? Robert Munro