SlideShare una empresa de Scribd logo
1 de 25
Introduction to Text Miningand Semantics Seth Grimes -- President, Alta Plana
New York Times October 9, 1958
From text to information “Text expresses a vast, rich range of information, but encodes this information in a form that is difficult to decipher automatically.”			 -- Marti A. Hearst, “Untangling Text Data Mining,” 1999
Document input and processing Information extraction Knowledge management Hans Peter Luhn, “A Business Intelligence System,” IBM Journal, October 1958
Statistical analysis of content “Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance.” Hans Peter Luhn,  “The Automatic Creation of Literature Abstracts,”  IBM Journal, April 1958
Statistical analysis limitations “This rather unsophisticated argument on ‘significance’ avoids such linguistic implications as grammar and syntax... No attention is paid to the logical and semantic relationships the author has established.” -- Hans Peter Luhn, 1958
Semantic links New York Times, September 8, 1957 Anaphora / coreference: “They”
Digital content universe “The broadcast, media, and entertainment industries garner about 4% of the world’s revenues but already generate, manage, or otherwise oversee 50% of the digital universe.” “The Diverse and Exploding Digital Universe,” (IDC, 2008) Approximately 70% of the digital universe is created by individuals.
The “unstructured data” challenge The digital universe: ,[object Object]
Blogs, forum postings, and social media.
E-mail, Contact-center notes and transcripts; recorded conversation.
Surveys, feedback forms, warranty & insurance claims.
Office documents, regulatory filings, reports, scientific papers.
And every other sort of document imaginable.Is Search up to the job?
Search results How are the quality, value & authority of search results? Hotel’s opinion Who profits from search? Guest’s opinion - about Priceline
From Web 1.0 to Web 2.0 How can we do better? “We have many of the tools in place -- from Web 2.0 technologies…” “The Diverse and Exploding Digital Universe,” (IDC, 2008)
Web 2.0 “Web 2.0 is the business revolution in the computer industry caused by the move to the Internet as a platform.”	-- Tim O’Reilly, 2004 “[A] move from personal websites to blogs and blog site aggregation, from publishing to participation,… an ongoing and interactive process... to links based on tagging.” 	-- Terry Flew, “New Media: An Introduction,” 2008 Web 2.0 is dynamic, personalized, interactive, collaborative.
From information growth to economicgrowth “We have many of the tools in place -- from Web 2.0 technologies… to unstructured data search software and the Semantic Web -- to tame the digital universe. Done right, we can turn information growth into economic growth.” -- “The Diverse and Exploding Digital Universe,” (IDC, 2008)
Text mining: from information to intelligence Text mining enables smarter search that better responds to user goals, e.g., answers –
From Web 2.0 to Web 3.0 For even better findability: “The Semantic Web is a web of data, in some ways like a global database.” 		-- Tim Berners-Lee, 1998 Web 3.0 is Web 2.0 + the Semantic Web + semantic tools. Recurring themes: ,[object Object]
Linked Data.
Context sensitive.
Location aware.,[object Object]
Steps in the right direction…
… and missteps “Kind” = type, variety, not a sentiment. Complete misclassification External reference Unfiltered duplicates

Más contenido relacionado

La actualidad más candente

Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introductionguest0edcaf
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining AreaMahamudHasanCSE
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text miningKrish_ver2
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extractionguest0edcaf
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web miningDataminingTools Inc
 
3. introduction to text mining
3. introduction to text mining3. introduction to text mining
3. introduction to text miningLokesh Ramaswamy
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
Role of Text Mining in Search Engine
Role of Text Mining in Search EngineRole of Text Mining in Search Engine
Role of Text Mining in Search EngineJay R Modi
 
Text Mining Framework
Text Mining FrameworkText Mining Framework
Text Mining FrameworkPrakhyath Rai
 
SA2: Text Mining from User Generated Content
SA2: Text Mining from User Generated ContentSA2: Text Mining from User Generated Content
SA2: Text Mining from User Generated ContentJohn Breslin
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingOntotext
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalRoi Blanco
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)9866825059
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantinimaxfalc
 
Text data mining1
Text data mining1Text data mining1
Text data mining1KU Leuven
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalCarsten Eickhoff
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Kira
 

La actualidad más candente (20)

Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining Area
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text mining
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extraction
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
3. introduction to text mining
3. introduction to text mining3. introduction to text mining
3. introduction to text mining
 
Week12
Week12Week12
Week12
 
Text mining
Text miningText mining
Text mining
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Role of Text Mining in Search Engine
Role of Text Mining in Search EngineRole of Text Mining in Search Engine
Role of Text Mining in Search Engine
 
Text Mining Framework
Text Mining FrameworkText Mining Framework
Text Mining Framework
 
SA2: Text Mining from User Generated Content
SA2: Text Mining from User Generated ContentSA2: Text Mining from User Generated Content
SA2: Text Mining from User Generated Content
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining Processing
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
data mining
data miningdata mining
data mining
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 

Destacado

Introduction to text mining
Introduction to text miningIntroduction to text mining
Introduction to text miningLars Juhl Jensen
 
Text Mining in Cultural Heritage: Challenges
Text Mining in Cultural Heritage: ChallengesText Mining in Cultural Heritage: Challenges
Text Mining in Cultural Heritage: ChallengesMax Kaiser
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Salford Systems
 
Text Mining and Visualization
Text Mining and VisualizationText Mining and Visualization
Text Mining and VisualizationSeth Grimes
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in RAshraf Uddin
 
Text Mining using LDA with Context
Text Mining using LDA with ContextText Mining using LDA with Context
Text Mining using LDA with ContextSteffen Staab
 
Quick Tour of Text Mining
Quick Tour of Text MiningQuick Tour of Text Mining
Quick Tour of Text MiningYi-Shin Chen
 
TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles and CART  Decision Trees:  A Winning CombinationTreeNet Tree Ensembles and CART  Decision Trees:  A Winning Combination
TreeNet Tree Ensembles and CART Decision Trees: A Winning CombinationSalford Systems
 
ידיעון כרמיה 2-2015
ידיעון כרמיה 2-2015ידיעון כרמיה 2-2015
ידיעון כרמיה 2-2015perachadi
 

Destacado (11)

Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Introduction to text mining
Introduction to text miningIntroduction to text mining
Introduction to text mining
 
Text Mining in Cultural Heritage: Challenges
Text Mining in Cultural Heritage: ChallengesText Mining in Cultural Heritage: Challenges
Text Mining in Cultural Heritage: Challenges
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
 
Text Mining and Visualization
Text Mining and VisualizationText Mining and Visualization
Text Mining and Visualization
 
Text mining tutorial
Text mining tutorialText mining tutorial
Text mining tutorial
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in R
 
Text Mining using LDA with Context
Text Mining using LDA with ContextText Mining using LDA with Context
Text Mining using LDA with Context
 
Quick Tour of Text Mining
Quick Tour of Text MiningQuick Tour of Text Mining
Quick Tour of Text Mining
 
TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles and CART  Decision Trees:  A Winning CombinationTreeNet Tree Ensembles and CART  Decision Trees:  A Winning Combination
TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination
 
ידיעון כרמיה 2-2015
ידיעון כרמיה 2-2015ידיעון כרמיה 2-2015
ידיעון כרמיה 2-2015
 

Similar a Introduction to Text Mining and Semantics

Smart Content = Smart Business
Smart Content = Smart BusinessSmart Content = Smart Business
Smart Content = Smart BusinessSeth Grimes
 
Knowledge Extraction from Social Media
Knowledge Extraction from Social MediaKnowledge Extraction from Social Media
Knowledge Extraction from Social MediaSeth Grimes
 
The power of Structured Journalism & Hacker Culture in NPR
The power of Structured Journalism & Hacker Culture in NPRThe power of Structured Journalism & Hacker Culture in NPR
The power of Structured Journalism & Hacker Culture in NPRPoderomedia
 
Web 3.0? A look at the future of the World Wide Web
Web 3.0?  A look at the future of the World Wide WebWeb 3.0?  A look at the future of the World Wide Web
Web 3.0? A look at the future of the World Wide Webrgkwml
 
Text Analytics Past, Present & Future
Text Analytics Past, Present & FutureText Analytics Past, Present & Future
Text Analytics Past, Present & FutureSeth Grimes
 
Web Science Framework and InterDataNet
Web Science Framework and InterDataNetWeb Science Framework and InterDataNet
Web Science Framework and InterDataNetmaria chiara pettenati
 
The Internet of Things, Ambient Intelligence, and the Move Towards Intelligen...
The Internet of Things, Ambient Intelligence, and the Move Towards Intelligen...The Internet of Things, Ambient Intelligence, and the Move Towards Intelligen...
The Internet of Things, Ambient Intelligence, and the Move Towards Intelligen...George Vanecek
 
Big Data And Analytics: A Summary Of The X 4.0 Era
Big Data And Analytics: A Summary Of The X 4.0 EraBig Data And Analytics: A Summary Of The X 4.0 Era
Big Data And Analytics: A Summary Of The X 4.0 EraJohnWilson47710
 
Walking Our Way to the Web
Walking Our Way to the WebWalking Our Way to the Web
Walking Our Way to the WebFabien Gandon
 
Information Management Trends 2009
Information Management Trends 2009Information Management Trends 2009
Information Management Trends 2009Christopher Eagle
 
Data-Mining Twitter for Political Science -Hickman, Alfredo - Honors Thesis
Data-Mining Twitter for Political Science -Hickman, Alfredo - Honors ThesisData-Mining Twitter for Political Science -Hickman, Alfredo - Honors Thesis
Data-Mining Twitter for Political Science -Hickman, Alfredo - Honors ThesisAlfredo Hickman
 
COM6905 Research Methods And Professional Issues.docx
COM6905 Research Methods And Professional Issues.docxCOM6905 Research Methods And Professional Issues.docx
COM6905 Research Methods And Professional Issues.docxwrite31
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Artificial Intelligence Institute at UofSC
 
Zeng marcia ifla-subjectaccesssmartdatadh
Zeng marcia ifla-subjectaccesssmartdatadhZeng marcia ifla-subjectaccesssmartdatadh
Zeng marcia ifla-subjectaccesssmartdatadhMarcia Zeng
 
Intelligentcontent2009
Intelligentcontent2009Intelligentcontent2009
Intelligentcontent2009Salim Ismail
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)paperpublications3
 
The Ideal Collaborative Workspace
The Ideal Collaborative WorkspaceThe Ideal Collaborative Workspace
The Ideal Collaborative Workspaceibdrii
 
Search, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSearch, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSeth Grimes
 

Similar a Introduction to Text Mining and Semantics (20)

Smart Content = Smart Business
Smart Content = Smart BusinessSmart Content = Smart Business
Smart Content = Smart Business
 
Julie Clegg
Julie CleggJulie Clegg
Julie Clegg
 
Knowledge Extraction from Social Media
Knowledge Extraction from Social MediaKnowledge Extraction from Social Media
Knowledge Extraction from Social Media
 
The power of Structured Journalism & Hacker Culture in NPR
The power of Structured Journalism & Hacker Culture in NPRThe power of Structured Journalism & Hacker Culture in NPR
The power of Structured Journalism & Hacker Culture in NPR
 
Web 3.0? A look at the future of the World Wide Web
Web 3.0?  A look at the future of the World Wide WebWeb 3.0?  A look at the future of the World Wide Web
Web 3.0? A look at the future of the World Wide Web
 
Text Analytics Past, Present & Future
Text Analytics Past, Present & FutureText Analytics Past, Present & Future
Text Analytics Past, Present & Future
 
Web Science Framework and InterDataNet
Web Science Framework and InterDataNetWeb Science Framework and InterDataNet
Web Science Framework and InterDataNet
 
The Internet of Things, Ambient Intelligence, and the Move Towards Intelligen...
The Internet of Things, Ambient Intelligence, and the Move Towards Intelligen...The Internet of Things, Ambient Intelligence, and the Move Towards Intelligen...
The Internet of Things, Ambient Intelligence, and the Move Towards Intelligen...
 
Open Data Journalism
Open Data JournalismOpen Data Journalism
Open Data Journalism
 
Big Data And Analytics: A Summary Of The X 4.0 Era
Big Data And Analytics: A Summary Of The X 4.0 EraBig Data And Analytics: A Summary Of The X 4.0 Era
Big Data And Analytics: A Summary Of The X 4.0 Era
 
Walking Our Way to the Web
Walking Our Way to the WebWalking Our Way to the Web
Walking Our Way to the Web
 
Information Management Trends 2009
Information Management Trends 2009Information Management Trends 2009
Information Management Trends 2009
 
Data-Mining Twitter for Political Science -Hickman, Alfredo - Honors Thesis
Data-Mining Twitter for Political Science -Hickman, Alfredo - Honors ThesisData-Mining Twitter for Political Science -Hickman, Alfredo - Honors Thesis
Data-Mining Twitter for Political Science -Hickman, Alfredo - Honors Thesis
 
COM6905 Research Methods And Professional Issues.docx
COM6905 Research Methods And Professional Issues.docxCOM6905 Research Methods And Professional Issues.docx
COM6905 Research Methods And Professional Issues.docx
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 
Zeng marcia ifla-subjectaccesssmartdatadh
Zeng marcia ifla-subjectaccesssmartdatadhZeng marcia ifla-subjectaccesssmartdatadh
Zeng marcia ifla-subjectaccesssmartdatadh
 
Intelligentcontent2009
Intelligentcontent2009Intelligentcontent2009
Intelligentcontent2009
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
 
The Ideal Collaborative Workspace
The Ideal Collaborative WorkspaceThe Ideal Collaborative Workspace
The Ideal Collaborative Workspace
 
Search, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSearch, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled Vision
 

Más de Seth Grimes

Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingSeth Grimes
 
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowSeth Grimes
 
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextSeth Grimes
 
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Seth Grimes
 
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonSeth Grimes
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AISeth Grimes
 
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market TrendsSeth Grimes
 
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPersSeth Grimes
 
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Seth Grimes
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...Seth Grimes
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AISeth Grimes
 
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case studySeth Grimes
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisSeth Grimes
 
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to PracticeSeth Grimes
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextSeth Grimes
 
An Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialAn Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialSeth Grimes
 
The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social SentimentSeth Grimes
 
Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersSeth Grimes
 

Más de Seth Grimes (20)

Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
 
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to Know
 
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's Next
 
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
 
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter Dorrington
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
 
Emotion AI
Emotion AIEmotion AI
Emotion AI
 
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market Trends
 
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPers
 
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges?
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
 
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case study
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion Analysis
 
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to Practice
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
 
An Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialAn Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and Social
 
The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social Sentiment
 
Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and Providers
 

Último

Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchirictsugar
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfrichard876048
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxMarkAnthonyAurellano
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menzaictsugar
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMVoces Mineras
 
8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCRashishs7044
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCRashishs7044
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607dollysharma2066
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCRashishs7044
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMintel Group
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 

Último (20)

Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchir
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdf
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
 
Call Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North GoaCall Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North Goa
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQM
 
8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 Edition
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 

Introduction to Text Mining and Semantics

  • 1. Introduction to Text Miningand Semantics Seth Grimes -- President, Alta Plana
  • 2. New York Times October 9, 1958
  • 3. From text to information “Text expresses a vast, rich range of information, but encodes this information in a form that is difficult to decipher automatically.” -- Marti A. Hearst, “Untangling Text Data Mining,” 1999
  • 4. Document input and processing Information extraction Knowledge management Hans Peter Luhn, “A Business Intelligence System,” IBM Journal, October 1958
  • 5. Statistical analysis of content “Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance.” Hans Peter Luhn, “The Automatic Creation of Literature Abstracts,” IBM Journal, April 1958
  • 6. Statistical analysis limitations “This rather unsophisticated argument on ‘significance’ avoids such linguistic implications as grammar and syntax... No attention is paid to the logical and semantic relationships the author has established.” -- Hans Peter Luhn, 1958
  • 7. Semantic links New York Times, September 8, 1957 Anaphora / coreference: “They”
  • 8. Digital content universe “The broadcast, media, and entertainment industries garner about 4% of the world’s revenues but already generate, manage, or otherwise oversee 50% of the digital universe.” “The Diverse and Exploding Digital Universe,” (IDC, 2008) Approximately 70% of the digital universe is created by individuals.
  • 9.
  • 10. Blogs, forum postings, and social media.
  • 11. E-mail, Contact-center notes and transcripts; recorded conversation.
  • 12. Surveys, feedback forms, warranty & insurance claims.
  • 13. Office documents, regulatory filings, reports, scientific papers.
  • 14. And every other sort of document imaginable.Is Search up to the job?
  • 15. Search results How are the quality, value & authority of search results? Hotel’s opinion Who profits from search? Guest’s opinion - about Priceline
  • 16. From Web 1.0 to Web 2.0 How can we do better? “We have many of the tools in place -- from Web 2.0 technologies…” “The Diverse and Exploding Digital Universe,” (IDC, 2008)
  • 17. Web 2.0 “Web 2.0 is the business revolution in the computer industry caused by the move to the Internet as a platform.” -- Tim O’Reilly, 2004 “[A] move from personal websites to blogs and blog site aggregation, from publishing to participation,… an ongoing and interactive process... to links based on tagging.” -- Terry Flew, “New Media: An Introduction,” 2008 Web 2.0 is dynamic, personalized, interactive, collaborative.
  • 18. From information growth to economicgrowth “We have many of the tools in place -- from Web 2.0 technologies… to unstructured data search software and the Semantic Web -- to tame the digital universe. Done right, we can turn information growth into economic growth.” -- “The Diverse and Exploding Digital Universe,” (IDC, 2008)
  • 19. Text mining: from information to intelligence Text mining enables smarter search that better responds to user goals, e.g., answers –
  • 20.
  • 23.
  • 24. Steps in the right direction…
  • 25. … and missteps “Kind” = type, variety, not a sentiment. Complete misclassification External reference Unfiltered duplicates
  • 26.
  • 27. Text augmentation: metadata generation, content tagging.
  • 29.
  • 31.
  • 32. Primary applications What are your primary applications where text comes into play?
  • 33. Analyzedtextual information What textual information are you analyzing or do you plan to analyze? Current users responded:
  • 34. Extracted information Do you need (or expect to need) to extract or analyze:
  • 35. Overall satisfaction Please rate your overall experience – your satisfaction – with text mining.
  • 36. Movingahead Apply text mining to discover value in content. Develop / improve metadata and taxonomies. Adopt semantic technologies -- for content publishing and for user interactions -- to boost flexibility, findability, and profitability. And understand your audience: “By focusing on the fundamental aspects of the consumers’ online behavior -- not just current best practices -- companies will be better prepared when Web 2.0+ morphs into Web 3.0 and beyond.” -- Donna L. Hoffman, UC Riverside, in the McKinsey Quarterly