SlideShare a Scribd company logo
1 of 28
Perfect Text Analytics	 Seth Redmore VP, Product Management
Perfect per·fect     [adj., n. pur-fikt; v. per-fekt] 1. conforming absolutely to the description or definition of an ideal type: a perfect sphere; a perfect gentleman. 2. excellent or complete beyond practical or theoretical improvement: There is no perfect legal code. The proportions of this temple are almost perfect. 2 All right reserved © 2010 Lexalytics Inc.
Text Analytics The term text analytics describes a set of linguistic statistical, and machine learning techniques that model and structure the information content of textual sources. (Wikipedia) In other words, enhancing the value of text content by extracting entities, features, context, relationships and emotion. 3 All right reserved © 2010 Lexalytics Inc.
Perfect is Fast Average Human Reading Speed:  250wpm Conservative computer reading speed: 6000 wpm/core (our speed on a moderate single core) Each core is equivalent to the reading bandwidth of 12 people. Modern machines have 8 cores.  That’s just about 100 people in a box.   Nice. 4 All right reserved © 2010 Lexalytics Inc.
Perfect is Useable “I don’t like the results” is not the same as “the results are incorrect” Understanding the behavior key to usefulness Can you make better decisions? Can you make more money or save money? What is the most controversial area of text analytics? Thompson Reuters trading w/Sentiment Analysis increased Alpha (profit over market) by 80 basis points 5 All right reserved © 2010 Lexalytics Inc.
Useable: How much can you differ? “In my shop, that up until now has relied exclusively on human coding, we consider anything below 90% to be unacceptably inaccurate…. There is no doubt that automated sentiment is getting much much better, but to suggest that people should be okay with 20% of their data being wrong is just absurd.”  Katie Delahaye Payne Why is 10% “wrong” so much less absurd than 20% “wrong”? 20% Error 10% Error 6 All right reserved © 2010 Lexalytics Inc.
Perfect is Consistent Same results for same content, every time University of Pittsburgh “Multi-Perspective Question Answering” Corpus:  535 documents, 11k+ sentences.   40 hours of training for each rater ~80% inter-rater agreement 7 All right reserved © 2010 Lexalytics Inc.
Perfect is (new) Knowledge Discover the stuff you don’t know Text Analytics is really, really great at telling you the who, the what, and the where.  Sometimes the “how” You have to supply the “why” – but that question is way easier to answer when you know the other “w’s and the h” 8 All right reserved © 2010 Lexalytics Inc.
Perfect Includes Everything Running our top of the line software flat out across one year will cost you about $.002/document analyzed (news article sized content) (assuming 3 docs/core-second, 8 core machine) The more data the better and the greater worth your ta has 9 All right reserved © 2010 Lexalytics Inc.
Perfect is Trainable Can you solve YOUR business problem with it? Can you optimize to suit different kinds of content and roll those results up into a single reporting system? 10 All right reserved © 2010 Lexalytics Inc.
Perfect Text Analytics 11 All right reserved © 2010 Lexalytics Inc. Fast Useable Consistent Knowledge (that is) Inclusive Trainable
Customer Snapshots (or, “rubber, meet road”)
Reputation Management 13 All right reserved © 2010 Lexalytics Inc.
Politics 14 All right reserved © 2010 Lexalytics Inc.
Market Intelligence Client Employee User  Authentication Single  Sign-on External Content Providers SinglePoint Client Company User  Authentication Web 2.0 Collaboration Search Results Secondary Research Suppliers User  Authentication MI Analyst  Text Analytics Integrated  Index News & Journals  NL Search Engine FIREWALL Internal Document  Repository Optional Document  Repository Financial  analyst  reports Internal   research Content  Processing Custom Web  Crawls & Gov. Databases Trash can crawl,  FTP or CD 15 All right reserved © 2010 Lexalytics Inc.
Hospitality 16 All right reserved © 2010 Lexalytics Inc.
Financial Services Turns News into numbers for automatic trading systems ,[object Object]
Resilient server productAll right reserved © 2010 Lexalytics Inc. 17 Algorithmic Trading (QED firm) Financial data Indicators Buy/Sell RNSE Server Indicators ,[object Object]
QED (Quantitative and Event-Driven Trading) Banks, hedge funds.
JPMorgan, SocGen, Alpha Equities…and others,[object Object]
Pharma 19 All right reserved © 2010 Lexalytics Inc.
The Next Year…
Opinion Mining Who said what about whom? All right reserved © 2010 Lexalytics Inc. 21
Sarcasm, Twitter Model trained to detect sarcasm Once detected, you can decide what to do with it – because actually determining the sentiment is going to be unreliable New model trained on Twitter content Moving towards a concept of text analytics driven by business logic All right reserved © 2010 Lexalytics Inc. 22
Thesaurus-based Theme Rollup Machine generated conceptual taxonomy Gas/Electric Hybrid and EV might roll up to EV Fewer themes, but very useful to detect patterns across content All right reserved © 2010 Lexalytics Inc. 23
Foreign Language Support French is first, followed by other Romance languages New stemmer New summarization algorithm New part-of-speech tagger Automatic language detection New sentiment/entity extraction algorithms Also applicable to vertical specific content Confidence scoring by algorithm Use business logic to meld the results All right reserved © 2010 Lexalytics Inc. 24
Trainable Entity Sentiment New technique for entity sentiment Initial results from testing in English extremely promising Average human scoring overlap of >> 90% for scored sentences Initially used only for French 25 All right reserved © 2010 Lexalytics Inc.
Tool Enhancements Eventually use on English content: Twitter Customer Satisfaction Others… Entity Management Toolkit   Part of Speech Tagset training Using to train Salience on French Sentiment Toolkit Build your own entity sentiment models: French (first) New Sentiment Toolkit + Maximum Entropy  model builder allows new Entity and Sentiment modules New EMT helps us build a new French PoS tagger Entity Extraction & Sentiment Models Fully  Tagged Document Doc POS Tagger 26 All right reserved © 2010 Lexalytics Inc. Themes & Summaries

More Related Content

What's hot

What's hot (20)

Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
 
Getting Started with Unstructured Data
Getting Started with Unstructured DataGetting Started with Unstructured Data
Getting Started with Unstructured Data
 
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and bolts
 
Mind the Semantic Gap
Mind the Semantic GapMind the Semantic Gap
Mind the Semantic Gap
 
How many truths can you handle?
How many truths can you handle?How many truths can you handle?
How many truths can you handle?
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Zomato eda report
Zomato eda reportZomato eda report
Zomato eda report
 
Text analytics
Text analyticsText analytics
Text analytics
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
Applications: Prediction
Applications: PredictionApplications: Prediction
Applications: Prediction
 
Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)
 
Role of Text Mining in Search Engine
Role of Text Mining in Search EngineRole of Text Mining in Search Engine
Role of Text Mining in Search Engine
 
Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...
 
Troubleshooting and Optimizing Named Entity Resolution Systems in the Industry
Troubleshooting and Optimizing Named Entity Resolution Systems in the IndustryTroubleshooting and Optimizing Named Entity Resolution Systems in the Industry
Troubleshooting and Optimizing Named Entity Resolution Systems in the Industry
 
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific DataEvaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
Evaluation Mechanism for Similarity-Based Ranked Search Over Scientific Data
 

Similar to Lexalytics Text Analytics Workshop: Perfect Text Analytics

Taking A Look At Web Services
Taking A Look At Web ServicesTaking A Look At Web Services
Taking A Look At Web Services
Stacey Cruz
 
ICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPT
Dr. Haxel Consult
 
Internet vs intranet vs extranet
Internet vs intranet vs extranetInternet vs intranet vs extranet
Internet vs intranet vs extranet
Tej Kiran
 

Similar to Lexalytics Text Analytics Workshop: Perfect Text Analytics (20)

Ibm watson boston meetup may 27 2015
Ibm watson boston meetup may 27 2015Ibm watson boston meetup may 27 2015
Ibm watson boston meetup may 27 2015
 
Findability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligence
 
INTELLLEX tech team hiring
INTELLLEX tech team hiringINTELLLEX tech team hiring
INTELLLEX tech team hiring
 
Wdc tech talk cooper hackathon 2015
Wdc tech talk cooper hackathon 2015Wdc tech talk cooper hackathon 2015
Wdc tech talk cooper hackathon 2015
 
IBM Cognitive platform: IBM Watson
IBM Cognitive platform: IBM WatsonIBM Cognitive platform: IBM Watson
IBM Cognitive platform: IBM Watson
 
Get Started With Python Language.pdf
Get Started With Python Language.pdfGet Started With Python Language.pdf
Get Started With Python Language.pdf
 
Intro to watson bluemix services
Intro to watson bluemix servicesIntro to watson bluemix services
Intro to watson bluemix services
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Taking A Look At Web Services
Taking A Look At Web ServicesTaking A Look At Web Services
Taking A Look At Web Services
 
Steve Mills - Your Cognitive Future
Steve Mills - Your Cognitive FutureSteve Mills - Your Cognitive Future
Steve Mills - Your Cognitive Future
 
ChatGPT Deck.pptx
ChatGPT Deck.pptxChatGPT Deck.pptx
ChatGPT Deck.pptx
 
IBM cognitive service introduction
IBM cognitive service introductionIBM cognitive service introduction
IBM cognitive service introduction
 
Online job portal management system..pdf
Online job portal management system..pdfOnline job portal management system..pdf
Online job portal management system..pdf
 
iadaatpa gala boston
iadaatpa gala bostoniadaatpa gala boston
iadaatpa gala boston
 
Cognitive Computing
Cognitive ComputingCognitive Computing
Cognitive Computing
 
ICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPT
 
Symfony2
Symfony2Symfony2
Symfony2
 
Internet vs intranet vs extranet
Internet vs intranet vs extranetInternet vs intranet vs extranet
Internet vs intranet vs extranet
 
Content Analytics for Better Search
Content Analytics for Better SearchContent Analytics for Better Search
Content Analytics for Better Search
 
Mamba Media - Decoding Digital Language
Mamba Media - Decoding Digital LanguageMamba Media - Decoding Digital Language
Mamba Media - Decoding Digital Language
 

Recently uploaded

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Lexalytics Text Analytics Workshop: Perfect Text Analytics

  • 1. Perfect Text Analytics Seth Redmore VP, Product Management
  • 2. Perfect per·fect     [adj., n. pur-fikt; v. per-fekt] 1. conforming absolutely to the description or definition of an ideal type: a perfect sphere; a perfect gentleman. 2. excellent or complete beyond practical or theoretical improvement: There is no perfect legal code. The proportions of this temple are almost perfect. 2 All right reserved © 2010 Lexalytics Inc.
  • 3. Text Analytics The term text analytics describes a set of linguistic statistical, and machine learning techniques that model and structure the information content of textual sources. (Wikipedia) In other words, enhancing the value of text content by extracting entities, features, context, relationships and emotion. 3 All right reserved © 2010 Lexalytics Inc.
  • 4. Perfect is Fast Average Human Reading Speed: 250wpm Conservative computer reading speed: 6000 wpm/core (our speed on a moderate single core) Each core is equivalent to the reading bandwidth of 12 people. Modern machines have 8 cores. That’s just about 100 people in a box. Nice. 4 All right reserved © 2010 Lexalytics Inc.
  • 5. Perfect is Useable “I don’t like the results” is not the same as “the results are incorrect” Understanding the behavior key to usefulness Can you make better decisions? Can you make more money or save money? What is the most controversial area of text analytics? Thompson Reuters trading w/Sentiment Analysis increased Alpha (profit over market) by 80 basis points 5 All right reserved © 2010 Lexalytics Inc.
  • 6. Useable: How much can you differ? “In my shop, that up until now has relied exclusively on human coding, we consider anything below 90% to be unacceptably inaccurate…. There is no doubt that automated sentiment is getting much much better, but to suggest that people should be okay with 20% of their data being wrong is just absurd.” Katie Delahaye Payne Why is 10% “wrong” so much less absurd than 20% “wrong”? 20% Error 10% Error 6 All right reserved © 2010 Lexalytics Inc.
  • 7. Perfect is Consistent Same results for same content, every time University of Pittsburgh “Multi-Perspective Question Answering” Corpus: 535 documents, 11k+ sentences. 40 hours of training for each rater ~80% inter-rater agreement 7 All right reserved © 2010 Lexalytics Inc.
  • 8. Perfect is (new) Knowledge Discover the stuff you don’t know Text Analytics is really, really great at telling you the who, the what, and the where. Sometimes the “how” You have to supply the “why” – but that question is way easier to answer when you know the other “w’s and the h” 8 All right reserved © 2010 Lexalytics Inc.
  • 9. Perfect Includes Everything Running our top of the line software flat out across one year will cost you about $.002/document analyzed (news article sized content) (assuming 3 docs/core-second, 8 core machine) The more data the better and the greater worth your ta has 9 All right reserved © 2010 Lexalytics Inc.
  • 10. Perfect is Trainable Can you solve YOUR business problem with it? Can you optimize to suit different kinds of content and roll those results up into a single reporting system? 10 All right reserved © 2010 Lexalytics Inc.
  • 11. Perfect Text Analytics 11 All right reserved © 2010 Lexalytics Inc. Fast Useable Consistent Knowledge (that is) Inclusive Trainable
  • 12. Customer Snapshots (or, “rubber, meet road”)
  • 13. Reputation Management 13 All right reserved © 2010 Lexalytics Inc.
  • 14. Politics 14 All right reserved © 2010 Lexalytics Inc.
  • 15. Market Intelligence Client Employee User Authentication Single Sign-on External Content Providers SinglePoint Client Company User Authentication Web 2.0 Collaboration Search Results Secondary Research Suppliers User Authentication MI Analyst Text Analytics Integrated Index News & Journals NL Search Engine FIREWALL Internal Document Repository Optional Document Repository Financial analyst reports Internal research Content Processing Custom Web Crawls & Gov. Databases Trash can crawl, FTP or CD 15 All right reserved © 2010 Lexalytics Inc.
  • 16. Hospitality 16 All right reserved © 2010 Lexalytics Inc.
  • 17.
  • 18.
  • 19. QED (Quantitative and Event-Driven Trading) Banks, hedge funds.
  • 20.
  • 21. Pharma 19 All right reserved © 2010 Lexalytics Inc.
  • 23. Opinion Mining Who said what about whom? All right reserved © 2010 Lexalytics Inc. 21
  • 24. Sarcasm, Twitter Model trained to detect sarcasm Once detected, you can decide what to do with it – because actually determining the sentiment is going to be unreliable New model trained on Twitter content Moving towards a concept of text analytics driven by business logic All right reserved © 2010 Lexalytics Inc. 22
  • 25. Thesaurus-based Theme Rollup Machine generated conceptual taxonomy Gas/Electric Hybrid and EV might roll up to EV Fewer themes, but very useful to detect patterns across content All right reserved © 2010 Lexalytics Inc. 23
  • 26. Foreign Language Support French is first, followed by other Romance languages New stemmer New summarization algorithm New part-of-speech tagger Automatic language detection New sentiment/entity extraction algorithms Also applicable to vertical specific content Confidence scoring by algorithm Use business logic to meld the results All right reserved © 2010 Lexalytics Inc. 24
  • 27. Trainable Entity Sentiment New technique for entity sentiment Initial results from testing in English extremely promising Average human scoring overlap of >> 90% for scored sentences Initially used only for French 25 All right reserved © 2010 Lexalytics Inc.
  • 28. Tool Enhancements Eventually use on English content: Twitter Customer Satisfaction Others… Entity Management Toolkit Part of Speech Tagset training Using to train Salience on French Sentiment Toolkit Build your own entity sentiment models: French (first) New Sentiment Toolkit + Maximum Entropy model builder allows new Entity and Sentiment modules New EMT helps us build a new French PoS tagger Entity Extraction & Sentiment Models Fully Tagged Document Doc POS Tagger 26 All right reserved © 2010 Lexalytics Inc. Themes & Summaries
  • 29. Business Logic + TA Algorithms Content Source Search Business Logic Other TA System Sarcasm Route On Sports Finance Unknown $ ? A B C D Entity: Cisco 27 All right reserved © 2010 Lexalytics Inc. ProbabilityScores Cisco : Positive
  • 30. Summary Lots of people making money with text analytics In lots of different verticals Next 12 months brings online a whole host of features to make our software even more flexible Check out tas.lexalytics.com Check out www.lexalytics.com/lexascope All right reserved © 2010 Lexalytics Inc. 28