SlideShare una empresa de Scribd logo
1 de 38
Search, Signals & Sense:
  An Analytics Fueled Vision

Seth Grimes
@sethgrimes
A Sense Making Story

                       New York Times,
                       September 30, 2012
Valium: Starting a Chain of Connections

                           New York Times,
                           September 8, 1957
H.P. Luhn

By H.P. Luhn, in
IBM Journal,
April, 1958

http://altaplana.com/ibm-
luhn58-LiteratureAbstracts.pdf
Modelling Text

                                                           Luhn’s analysis of
                                                           Messengers of the Nervous
                                                           System, a Scientific American
                                                           article
                                                                 http://wordle.net, applied
                                                                    to the NY Times article




“Statistical information derived from word frequency and distribution is
used by the machine to compute a relative measure of significance, first
for individual words and then for sentences. Sentences scoring highest in
significance are extracted and printed out to become the auto-abstract.”
 -- H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.
Luhn’s Example

                 New York Times,
                 September 8, 1957
Close Reading
Can Software Make the Connection?




               Mark Lombardi, George W. Bush, Harken Energy
                      and Jackson Stephens, c. 1979-90, Detail
There and Back Again: Modelling Text, 2


The text content of a document can be considered an
 unordered “bag of words.”
Particular documents are points in a high-dimensional
 vector space.
         Salton, Wong &
         Yang, “A Vector
         Space Model for
         Automatic
         Indexing,”
         November 1975.
Modelling Text, 3


We might construct a document-term matrix...
  • D1 = “I like databases”
  • D2 = “I hate hate databases”

                 I        like               hate              databases
       D1        1        1                  0                 1
       D2        1        0                  2                 1
                              http://en.wikipedia.org/wiki/Term-document_matrix


and use a weighting such as TF-IDF (term frequency–
 inverse document frequency)…
in computing the cosine of the angle between
  weighted doc-vectors to determine similarity.
Modelling Text, 4


In the form of query-document similarity, this is
  Information Retrieval 101.
  • See, for instance, Salton & Buckley, “Term-Weighting
    Approaches in Automatic Text Retrieval,” 1988.
  • A useful basic tech paper: Russ Albright, SAS, “Taming Text
    with the SVD,” 2004.
Given the complexity of human language, statistical
 models may fall short.
  “Reading from text in general is a hard problem, because it
  involves all of common sense knowledge.”
                -- Expert systems pioneer Edward A. Feigenbaum
From Text to Data: Features


Analytical methods make text tractable.
  Latent semantic indexing utilizing singular value
    decomposition for term reduction / feature selection.
Classification technologies / methods:
  • Naive Bayes.
  • Support Vector Machine.
  • K-nearest neighbor.
“Reading from Text is a Hard Problem”

 Eugène
 Delacroix,
 St. Michael
 Defeats the
 Devil




          Thus the Orb he roam'd
With narrow search; and with inspection deep
   Consider'd every Creature, which of all
   Most opportune might serve his Wiles.
                     -- John Milton, Paradise Lost
Data, Search, Analysis, and Discovery

   Eugène
   Delacroix,
   St. Michael
   Defeats the
   Devil

                                     Data
For                                  Space
features
                                            Analysis
            Thus the Orb he roam'd
  With narrow search; and with inspection deep
     Consider'd every Creature, which of all Intent,
     Most opportune might serve his Wiles. Goals
                       -- John Milton, Paradise Lost
The User Interface

“Search is the UI for data today.”
                  -- Grant Ingersoll, Chief Scientist, LucidWorks
                                                       Quoted by Gil Press in Forbes,
                                  “LucidWorks: Bringing Search to Big Data”
                    http://www.forbes.com/sites/gilpress/2012/09/24/lucidworks-bringing-search-to-big-data/




What’s beyond?
Search and Sensemaking

“It is convenient to divide the entire
information access process into two
main components: information retrieval
through searching and browsing, and
analysis and synthesis of results. This
broader process is often referred to in
the literature as sensemaking.
Sensemaking refers to an iterative
process of formulating a conceptual
representation from of a large volume
of information. Search plays only one
part in this process.”
                   -- Marti Hearst, 2009
                          http://searchuserinterfaces.com/
Senseless Search

New but old: Dumb and siloed
Searcher Supplied Sense

Better?
Siloed signals.

More better?
Semantic Search Engines

Meh.
Clustered Clarity

Carrot2.
(open source)
Semanticized (Web) Search




Google
Knowledge
Graph
Search Fronted Analysis & Discovery


                                  Fusions,
                                  Signals
Toward Semantic Search Sensemaking

Old Search                    Sensemaking
Search on: keywords           + identity, history & context
Sources: content/type silos   Unified
Indexed: terms                + metadata (properties)
Returned: hit lists           Categories / clusters /
                              answers first
Relevance: PageRank           (Inferred) intent
Prevalence: plenty of new     Plenty of established
 platforms with old(ish)       search with new(ish)
 search                        capabilities, also wanna-
                               bes.
The Back End

Platforms and ecosystems.
APIs and services.
Text and content analytics --
   Discerns and extracts features including relationships from
     source materials.
   Features = entities, key-value pairs, concepts, topics,
     events, sentiment, etc.
   Provide (for) BI on content-sourced data.
Data integration, record linkage, data fusion.
Text+ Technology Mashups

Text/content analytics generates semantics to bridge
   search, BI, and applications, enabling next-
   generation information systems.
 Semantic search                           Information access
 (search + text)                           (search + text + BI)


Search based         Search         BI
applications
                                           Integrated analytics
(search + text +
                                           (text + BI)
apps)
                         Applica-
    Text analytics        tions          NextGen CRM, EFM,
    (inner circle)                       MR, marketing, …
Analytical Assets (Open Source)




                        >>> import nltk
                        >>> sentence = """At eight o'clock on Thursday
                        morning... Arthur didn't feel very good."""
                        >>> tokens = nltk.word_tokenize(sentence)
                        >>> tokens
                        ['At', 'eight', "o'clock", 'on', 'Thursday', 'morning',
                        'Arthur', 'did', "n't", 'feel', 'very', 'good', '.']
                        >>> tagged = nltk.pos_tag(tokens)
                        >>> tagged[0:6]
                        [('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'),
                        ('Thursday', 'NNP'), ('morning', 'NN')]

                                                        http://nltk.org/
tm: Text Mining Package
A framework for text mining
applications within R.
A Big Data Analytics Architecture

http://hpccsystems.com/ (GNU Affero GPL)




           http://www.geeklawblog.com/2011/12/lexis-advance-platform-launch-two.html
Commercial (Non-OS) Solutions Plug In
Drivers and Trends

Social media!
    … and personal-social-enterprise integration.
Via-API cloud services.
Big Data (even if you don’t like the term).
    Volume and velocity mean new analytical approaches.
    Variety: new types and a new fusion imperative.
Sentiment: Mood, opinions, emotions, intent.
Question answering.
Text Tech Initiatives

Now and near future.
    • Broader & deeper international language support.
    • Sentiment analysis, beyond polarity.
      Emotions, intent signals. etc.
    • Identity resolution & profile extraction.
      Online-social-enterprise data integration.
    • Semantic data integration, Complex Data.
    • Speech analytics.
    • Discourse analysis.
      Because isolated messages are not conversations.
    • Rich-media content analytics.
    • Augmented reality; new human-computer interfaces.
Personal. Mobile. Intelligent?




http://timoelliott.com/blog/2010/10/sap-businessobjects-augmented-
explorer-now-available-resources-to-test-it.html
A Focus on Information & Applications

Now and near future.
    • Signal detection.
      Sentiment, emotion, identity, intent.
    • Semanticized applications.
      Linkable, mashable, enrichable.
    • Rich information.
      Context sensitive, situational.
Σ = Sensemaking.
Onward… to Q&A
Search, Signals & Sense:
  An Analytics Fueled Vision

Seth Grimes
@sethgrimes

Más contenido relacionado

La actualidad más candente

Km cognitive computing overview by ken martin 19 jan2015
Km   cognitive computing overview by ken martin 19 jan2015Km   cognitive computing overview by ken martin 19 jan2015
Km cognitive computing overview by ken martin 19 jan2015HCL Technologies
 
AI Deep Learning - CF Machine Learning
AI Deep Learning - CF Machine LearningAI Deep Learning - CF Machine Learning
AI Deep Learning - CF Machine LearningKarl Seiler
 
wEb infomation retrieval
wEb infomation retrievalwEb infomation retrieval
wEb infomation retrievalGeorge Ang
 
State Of The Art - Part 2 Products Projects
State Of The Art - Part 2 Products ProjectsState Of The Art - Part 2 Products Projects
State Of The Art - Part 2 Products ProjectsPascal Cottereau
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMMArjen de Vries
 
Brain Bridge: A Comparative Study between Database Querying and Human Memory ...
Brain Bridge: A Comparative Study between Database Querying and Human Memory ...Brain Bridge: A Comparative Study between Database Querying and Human Memory ...
Brain Bridge: A Comparative Study between Database Querying and Human Memory ...IOSR Journals
 
Project Proposal Topics Modeling (Ir)
Project Proposal    Topics Modeling (Ir)Project Proposal    Topics Modeling (Ir)
Project Proposal Topics Modeling (Ir)Svitlana volkova
 

La actualidad más candente (9)

Km cognitive computing overview by ken martin 19 jan2015
Km   cognitive computing overview by ken martin 19 jan2015Km   cognitive computing overview by ken martin 19 jan2015
Km cognitive computing overview by ken martin 19 jan2015
 
AI Deep Learning - CF Machine Learning
AI Deep Learning - CF Machine LearningAI Deep Learning - CF Machine Learning
AI Deep Learning - CF Machine Learning
 
wEb infomation retrieval
wEb infomation retrievalwEb infomation retrieval
wEb infomation retrieval
 
Knowledge acquisition using automated techniques
Knowledge acquisition using automated techniquesKnowledge acquisition using automated techniques
Knowledge acquisition using automated techniques
 
State Of The Art - Part 2 Products Projects
State Of The Art - Part 2 Products ProjectsState Of The Art - Part 2 Products Projects
State Of The Art - Part 2 Products Projects
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMM
 
Brain Bridge: A Comparative Study between Database Querying and Human Memory ...
Brain Bridge: A Comparative Study between Database Querying and Human Memory ...Brain Bridge: A Comparative Study between Database Querying and Human Memory ...
Brain Bridge: A Comparative Study between Database Querying and Human Memory ...
 
Project Proposal Topics Modeling (Ir)
Project Proposal    Topics Modeling (Ir)Project Proposal    Topics Modeling (Ir)
Project Proposal Topics Modeling (Ir)
 
140101.rjr.pubs
140101.rjr.pubs140101.rjr.pubs
140101.rjr.pubs
 

Destacado

Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersSeth Grimes
 
The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social SentimentSeth Grimes
 
12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content Analytics12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content AnalyticsSeth Grimes
 
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and ProvidersText/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and ProvidersSeth Grimes
 
Knowledge Extraction from Social Media
Knowledge Extraction from Social MediaKnowledge Extraction from Social Media
Knowledge Extraction from Social MediaSeth Grimes
 
Text Analytics Today
Text Analytics TodayText Analytics Today
Text Analytics TodaySeth Grimes
 
Design of multichannel attribution model using click stream data
Design of multichannel attribution model using click stream dataDesign of multichannel attribution model using click stream data
Design of multichannel attribution model using click stream dataLucie Šperková
 

Destacado (7)

Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and Providers
 
The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social Sentiment
 
12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content Analytics12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content Analytics
 
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and ProvidersText/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
 
Knowledge Extraction from Social Media
Knowledge Extraction from Social MediaKnowledge Extraction from Social Media
Knowledge Extraction from Social Media
 
Text Analytics Today
Text Analytics TodayText Analytics Today
Text Analytics Today
 
Design of multichannel attribution model using click stream data
Design of multichannel attribution model using click stream dataDesign of multichannel attribution model using click stream data
Design of multichannel attribution model using click stream data
 

Similar a Search, Signals & Sensemaking: An Analytics Vision

Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxstilliegeorgiana
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text minianhcrowley
 
Text Analytics Overview, 2011
Text Analytics Overview, 2011Text Analytics Overview, 2011
Text Analytics Overview, 2011Seth Grimes
 
Integrated approach for domain dimensional information retrieval system by us...
Integrated approach for domain dimensional information retrieval system by us...Integrated approach for domain dimensional information retrieval system by us...
Integrated approach for domain dimensional information retrieval system by us...Alexander Decker
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1Sumit Sony
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataSeth Grimes
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextSeth Grimes
 
text_mining.doc
text_mining.doctext_mining.doc
text_mining.docbutest
 
Introduction to Text Mining and Semantics
Introduction to Text Mining and SemanticsIntroduction to Text Mining and Semantics
Introduction to Text Mining and SemanticsSeth Grimes
 
Clustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative StudyClustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative Studyijcsit
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibEl Habib NFAOUI
 
Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic TechnolgyTalat Fakhri
 
Discovery and the Age of Insight: Walmart EIM Open House 2013
Discovery and the Age of Insight: Walmart EIM Open House 2013Discovery and the Age of Insight: Walmart EIM Open House 2013
Discovery and the Age of Insight: Walmart EIM Open House 2013Joe Lamantia
 
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...Amit Sheth
 
Artificial intelligent Lec 1-ai-introduction-
Artificial intelligent Lec 1-ai-introduction-Artificial intelligent Lec 1-ai-introduction-
Artificial intelligent Lec 1-ai-introduction-Taymoor Nazmy
 

Similar a Search, Signals & Sensemaking: An Analytics Vision (20)

Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docx
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text mini
 
Text Analytics Overview, 2011
Text Analytics Overview, 2011Text Analytics Overview, 2011
Text Analytics Overview, 2011
 
Integrated approach for domain dimensional information retrieval system by us...
Integrated approach for domain dimensional information retrieval system by us...Integrated approach for domain dimensional information retrieval system by us...
Integrated approach for domain dimensional information retrieval system by us...
 
Hci
HciHci
Hci
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1
 
Oss swot
Oss swotOss swot
Oss swot
 
Computing for Human Experience [v3, Aug-Oct 2010]
Computing for Human Experience [v3, Aug-Oct 2010]Computing for Human Experience [v3, Aug-Oct 2010]
Computing for Human Experience [v3, Aug-Oct 2010]
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
 
text_mining.doc
text_mining.doctext_mining.doc
text_mining.doc
 
Introduction to Text Mining and Semantics
Introduction to Text Mining and SemanticsIntroduction to Text Mining and Semantics
Introduction to Text Mining and Semantics
 
Clustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative StudyClustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative Study
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 
Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic Technolgy
 
Discovery and the Age of Insight: Walmart EIM Open House 2013
Discovery and the Age of Insight: Walmart EIM Open House 2013Discovery and the Age of Insight: Walmart EIM Open House 2013
Discovery and the Age of Insight: Walmart EIM Open House 2013
 
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
 
Artificial intelligent Lec 1-ai-introduction-
Artificial intelligent Lec 1-ai-introduction-Artificial intelligent Lec 1-ai-introduction-
Artificial intelligent Lec 1-ai-introduction-
 

Más de Seth Grimes

Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingSeth Grimes
 
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowSeth Grimes
 
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextSeth Grimes
 
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Seth Grimes
 
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonSeth Grimes
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AISeth Grimes
 
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market TrendsSeth Grimes
 
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPersSeth Grimes
 
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Seth Grimes
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...Seth Grimes
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AISeth Grimes
 
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case studySeth Grimes
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisSeth Grimes
 
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to PracticeSeth Grimes
 
An Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialAn Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialSeth Grimes
 
Social Data Sentiment Analysis
Social Data Sentiment AnalysisSocial Data Sentiment Analysis
Social Data Sentiment AnalysisSeth Grimes
 
Global Analytics: Text, Speech, Sentiment, and Sense
Global Analytics: Text, Speech, Sentiment, and SenseGlobal Analytics: Text, Speech, Sentiment, and Sense
Global Analytics: Text, Speech, Sentiment, and SenseSeth Grimes
 
Text Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry ViewText Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry ViewSeth Grimes
 

Más de Seth Grimes (20)

Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
 
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to Know
 
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's Next
 
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
 
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter Dorrington
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
 
Emotion AI
Emotion AIEmotion AI
Emotion AI
 
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market Trends
 
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPers
 
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges?
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
 
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case study
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion Analysis
 
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to Practice
 
An Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialAn Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and Social
 
Social Data Sentiment Analysis
Social Data Sentiment AnalysisSocial Data Sentiment Analysis
Social Data Sentiment Analysis
 
Global Analytics: Text, Speech, Sentiment, and Sense
Global Analytics: Text, Speech, Sentiment, and SenseGlobal Analytics: Text, Speech, Sentiment, and Sense
Global Analytics: Text, Speech, Sentiment, and Sense
 
Text Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry ViewText Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry View
 

Último

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

Search, Signals & Sensemaking: An Analytics Vision

  • 1. Search, Signals & Sense: An Analytics Fueled Vision Seth Grimes @sethgrimes
  • 2. A Sense Making Story New York Times, September 30, 2012
  • 3. Valium: Starting a Chain of Connections New York Times, September 8, 1957
  • 4. H.P. Luhn By H.P. Luhn, in IBM Journal, April, 1958 http://altaplana.com/ibm- luhn58-LiteratureAbstracts.pdf
  • 5.
  • 6. Modelling Text Luhn’s analysis of Messengers of the Nervous System, a Scientific American article http://wordle.net, applied to the NY Times article “Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the auto-abstract.” -- H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.
  • 7. Luhn’s Example New York Times, September 8, 1957
  • 9.
  • 10. Can Software Make the Connection? Mark Lombardi, George W. Bush, Harken Energy and Jackson Stephens, c. 1979-90, Detail
  • 11. There and Back Again: Modelling Text, 2 The text content of a document can be considered an unordered “bag of words.” Particular documents are points in a high-dimensional vector space. Salton, Wong & Yang, “A Vector Space Model for Automatic Indexing,” November 1975.
  • 12. Modelling Text, 3 We might construct a document-term matrix... • D1 = “I like databases” • D2 = “I hate hate databases” I like hate databases D1 1 1 0 1 D2 1 0 2 1 http://en.wikipedia.org/wiki/Term-document_matrix and use a weighting such as TF-IDF (term frequency– inverse document frequency)… in computing the cosine of the angle between weighted doc-vectors to determine similarity.
  • 13. Modelling Text, 4 In the form of query-document similarity, this is Information Retrieval 101. • See, for instance, Salton & Buckley, “Term-Weighting Approaches in Automatic Text Retrieval,” 1988. • A useful basic tech paper: Russ Albright, SAS, “Taming Text with the SVD,” 2004. Given the complexity of human language, statistical models may fall short. “Reading from text in general is a hard problem, because it involves all of common sense knowledge.” -- Expert systems pioneer Edward A. Feigenbaum
  • 14. From Text to Data: Features Analytical methods make text tractable. Latent semantic indexing utilizing singular value decomposition for term reduction / feature selection. Classification technologies / methods: • Naive Bayes. • Support Vector Machine. • K-nearest neighbor.
  • 15. “Reading from Text is a Hard Problem” Eugène Delacroix, St. Michael Defeats the Devil Thus the Orb he roam'd With narrow search; and with inspection deep Consider'd every Creature, which of all Most opportune might serve his Wiles. -- John Milton, Paradise Lost
  • 16. Data, Search, Analysis, and Discovery Eugène Delacroix, St. Michael Defeats the Devil Data For Space features Analysis Thus the Orb he roam'd With narrow search; and with inspection deep Consider'd every Creature, which of all Intent, Most opportune might serve his Wiles. Goals -- John Milton, Paradise Lost
  • 17. The User Interface “Search is the UI for data today.” -- Grant Ingersoll, Chief Scientist, LucidWorks Quoted by Gil Press in Forbes, “LucidWorks: Bringing Search to Big Data” http://www.forbes.com/sites/gilpress/2012/09/24/lucidworks-bringing-search-to-big-data/ What’s beyond?
  • 18. Search and Sensemaking “It is convenient to divide the entire information access process into two main components: information retrieval through searching and browsing, and analysis and synthesis of results. This broader process is often referred to in the literature as sensemaking. Sensemaking refers to an iterative process of formulating a conceptual representation from of a large volume of information. Search plays only one part in this process.” -- Marti Hearst, 2009 http://searchuserinterfaces.com/
  • 19. Senseless Search New but old: Dumb and siloed
  • 25. Search Fronted Analysis & Discovery Fusions, Signals
  • 26. Toward Semantic Search Sensemaking Old Search Sensemaking Search on: keywords + identity, history & context Sources: content/type silos Unified Indexed: terms + metadata (properties) Returned: hit lists Categories / clusters / answers first Relevance: PageRank (Inferred) intent Prevalence: plenty of new Plenty of established platforms with old(ish) search with new(ish) search capabilities, also wanna- bes.
  • 27. The Back End Platforms and ecosystems. APIs and services. Text and content analytics -- Discerns and extracts features including relationships from source materials. Features = entities, key-value pairs, concepts, topics, events, sentiment, etc. Provide (for) BI on content-sourced data. Data integration, record linkage, data fusion.
  • 28. Text+ Technology Mashups Text/content analytics generates semantics to bridge search, BI, and applications, enabling next- generation information systems. Semantic search Information access (search + text) (search + text + BI) Search based Search BI applications Integrated analytics (search + text + (text + BI) apps) Applica- Text analytics tions NextGen CRM, EFM, (inner circle) MR, marketing, …
  • 29. Analytical Assets (Open Source) >>> import nltk >>> sentence = """At eight o'clock on Thursday morning... Arthur didn't feel very good.""" >>> tokens = nltk.word_tokenize(sentence) >>> tokens ['At', 'eight', "o'clock", 'on', 'Thursday', 'morning', 'Arthur', 'did', "n't", 'feel', 'very', 'good', '.'] >>> tagged = nltk.pos_tag(tokens) >>> tagged[0:6] [('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'), ('Thursday', 'NNP'), ('morning', 'NN')] http://nltk.org/ tm: Text Mining Package A framework for text mining applications within R.
  • 30. A Big Data Analytics Architecture http://hpccsystems.com/ (GNU Affero GPL) http://www.geeklawblog.com/2011/12/lexis-advance-platform-launch-two.html
  • 32. Drivers and Trends Social media! … and personal-social-enterprise integration. Via-API cloud services. Big Data (even if you don’t like the term). Volume and velocity mean new analytical approaches. Variety: new types and a new fusion imperative. Sentiment: Mood, opinions, emotions, intent. Question answering.
  • 33. Text Tech Initiatives Now and near future. • Broader & deeper international language support. • Sentiment analysis, beyond polarity. Emotions, intent signals. etc. • Identity resolution & profile extraction. Online-social-enterprise data integration. • Semantic data integration, Complex Data. • Speech analytics. • Discourse analysis. Because isolated messages are not conversations. • Rich-media content analytics. • Augmented reality; new human-computer interfaces.
  • 35. A Focus on Information & Applications Now and near future. • Signal detection. Sentiment, emotion, identity, intent. • Semanticized applications. Linkable, mashable, enrichable. • Rich information. Context sensitive, situational. Σ = Sensemaking.
  • 36.
  • 38. Search, Signals & Sense: An Analytics Fueled Vision Seth Grimes @sethgrimes