SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
Oct 13, 2016
Ivelina Nikolova
Senior NLP Engineer
Best Practices for Large Scale
Text Mining Process
2Oct 13, 2016
In this webinar you will learn …
• Industry applications that maximize Return on
Investment (ROI) of your text mining process
• To describe your text mining problem
• To define the output of the text mining
• To select the appropriate text analysis techniques
• To plan the prerequisites for a successful text mining
solution
• DOs and DON’Ts in setting up a text mining process.
3Oct 13, 2016
Outline
• Business need for text mining solutions
• Introduction to NLP and information extraction
• How to tailor your text analysis process
• Applications and demonstrations
4Oct 13, 2016
• Links mentions in the text to knowledge base concepts
• Automatic, manual and semi-automatic
Semantic annotation/enrichment
5Oct 13, 2016
• Semantic annotation facilitates:
– data search
– data management
– data understanding
– and more abstract modeling of the textual content
like…
Business needs for text mining solutions
6Oct 13, 2016
• Text summarization
• Content recommendation
• Document classification
• Topic extraction
• Document search and retrieval
• Question answering
• Sentiment analysis
Business needs for text mining solutions
7Oct 13, 2016
Some of our customers
8Oct 13, 2016
• Computational Linguistics (CS)
• Natural Language Processing (NLP)
• Text Mining (TM)
• Information Extraction (IE)
• Named Entity Recognition (NER)
NLP and IE
9Oct 13, 2016
• Named Entity Recognition
– 60% F1 [OKE-challenge@ESWC2015]
– 82.9% F1 [Leaman and Lu, 2016] in the biomedical
domain
– above 90% for more specific tasks
State-of-the art
10Oct 13, 2016
• Language and domain dependent
• The input is free text
“President Barack Obama labels Donald Trump comments as 'disturbing'”
“Barack Obama labels Donald Trump comments as 'disturbing'”
“President Obama labels Donald Trump comments as 'disturbing'”
• Natural language ambiguity
I cleaned the dishes in my pajamas.
I cleaned the dishes in the sink.
Georgia was happy with the meal, her boyfriend cooked.
Maria is excited about her trip to Georgia next month.
Why is NLP so hard?
11Oct 13, 2016
Designing the text mining process
• Know your business problem
• Know your data
• Find appropriate samples
• Use common formats or formats which can be easily transformed to such
• Get together domain experts, technical staff, NLP engineers and potential
users
• Narrow the business problem to information extraction task
• Clear the annotation types
• Clear the annotation guidelines
• Apply the appropriate algorithm for IE
• Do iterations of evaluation and improvement
• Insure continuous adaptation by curation and re-training
12Oct 13, 2016
13Oct 13, 2016
Clear problem definition
• Define clearly your business problem
• specific smart search
• content recommendation
• content enrichment
• content aggregation etc.
E.g. the system must do <A, B, C>
• Define clearly the text analysis problem
• Reduce the business problem to information extraction problem
Business problem: faceted search by Persons, Organizations,
Locations
Information extraction problem: extract mentions of Persons,
Organizations, Locations and link them to the corresponding
concepts in the knowledge base
14Oct 13, 2016
• Annotations – abstract descriptions of the mentions of
concepts of interest
Named entities: Person, Location, Organization
Disease, Symptom, Chemical
SpaceObject, SpaceCraf
Relations: PersonHasRoleInOrganisation, Causation
Define the annotation types I
15Oct 13, 2016
• Annotation types
• Person, Organization, Location
• Person, Organization, City
• Person, Organization, City, Country
• Annotation features
Location: string, geonames instance, latitude, longitude
Define the annotation types II
16Oct 13, 2016
Locations mentioned Holocaust documents
17Oct 13, 2016
• Annotation types
• Person, Organization, Location
• Person, Organization, City
• Person, Organization, City, Country
• Annotation features
Location: string, geonames instance, latitude, longitude
Chemical: string, inChi, SMILES, CAS
PersonHasRoleInOrganization: person instance, role instance,
organization instance, timestamp
Define the annotation types II
string: the Gulf of Mexico
startOffset: 71
endOffset: 89
type: Location
inst: http://ontology.ontotext.com/resource/tsk7b61yf5ds
links: [http://sws.geonames.org/3523271/
http://dbpedia.org/resource/Gulf_of_Mexico]
latitude:25.368611
longitude:-90.390556
18Oct 13, 2016
• Realistic
• Demonstrating the desired output
• Positive and negative
• “It therefore increases insulin secretion and reduces POS[glucose] levels,
especially postprandially.”
• “It acts by increasing POS[NEG[glucose]-induced insulin] release and by
reducing glucagon secretion postprandially.”
• Representative and balanced set of the types of problems
• In appropriate/commonly used format – XML, HTML, TXT,
CSV, DOC, PDF.
Provide examples
19Oct 13, 2016
Domain model and knowledge
• Domain model/ontology - describes the types of objects in the
problem area and the relations between them
20Oct 13, 2016
• Data sources - proprietary data, public data, professional data
• Data cleanup
• Data formats
• Data stores
• For metadata - GraphDB (http://ontotext.com/graphdb/)
• For content – MongoDB, MarkLogic etc.
• Data modeling is inevitable part of the process of semantic data
enrichment
• Start it as early as possible
• Keep to the common data formats
• Mistakes and underestimations are expensive because they influence the
whole process of developing a text mining solution
Data
21Oct 13, 2016
• Gold standard – annotated data with superior quality
• Annotation guidelines - used as guidance for manually annotating the
documents.
POS[London] universities = universities located in London
NEG[London] City Council
NEG[London] Mayor
• Manual annotation tools – intuitive UI, visualization features, export formats
• MANT – Ontotext's in-house tool
• GATE – http://gate.ac.uk/ and https://gate.ac.uk/teamware/
• Brad - http://brat.nlplab.org/
• Annotation approach
• Manual vs. semi-automatic
• Domain experts vs. crowd annotation
• E.g. Mechanical Turk - https://www.mturk.com/
• Inter-annotator agreement
• Train:Test ratio – 60:40, 70:30
Gold standard
22Oct 13, 2016
• Rule-based approach
• lower number of clear patterns which do not change over time or slightly change
• high precision
• appropriate for domains where it is important to know how the decision for extracting
given annotation is taken – e.g. bio-medical domain
• Machine learning approach
• higher number of patterns which do change over time
• requires annotated data
• allows for retraining over time
• Neural Network approach
• Deep Neural Networks - getting closer to AI
• Recent advances promise true natural language understanding via complex neural
networks
• Great results in Speech recognition, Image recognition and Machine translation;
breakthrough expected in NLP
• Still unclear why and how it works thus difficult to optimize
Text analysis approach
23Oct 13, 2016
• Preprocessing
• Keyphrase extraction
• Gazetteer based enrichment
• Named entity recognition and disambiguation
• Generic entity extraction
• Result consolidation
• Relation extraction
NER Pipeline
24Oct 13, 2016
NER pipeline
25Oct 13, 2016
NER pipeline
26Oct 13, 2016
NER pipeline
27Oct 13, 2016
• Curation of results - domain experts assess manually the work of the text
analysis components
• Testing interfaces
• Feedback
• Select representative set of documents to evaluate manually
• Provide as full description of the results and the used component as
possible:
 <pipeline version>
 <input as send for processing>
 <description of the wrong behavior>
 <description of the correct behavior>
• The earlier this happens it triggers revision of the models and
improvement of the annotation
Results curation / Error analysis
28Oct 13, 2016
• Gold standard split train:test
• 70:30
• 80:20
• Which task you want to evaluate
• E.g. extraction at document level
or inline annotation
• Evaluation metrics
• Information extraction tasks – precision, recall, F-measure
• Recommendations – A/B-testing
Evaluation of the results
29Oct 13, 2016
Continuous adaptation
30Oct 13, 2016
• Document categorization
• post, political news, sport news, etc.;
• Topic extraction
• important words and phrases in the text;
• Named entity recognition
• People, Organization, Location, Time, Amounts of money, etc.;
• Keyterm assignment from predefined hierarchies
• Concept extraction
• entities from a knowledge base;
• Relation extraction
• relations between types of entities.
Types of extracted information
31Oct 13, 2016
• TAG (http://tag.ontotext.com)
• NOW (http://now.ontotext.com)
• Patient Insights (http://patient.ontotext.com/) -
contact todor.primov@ontotext.com for credentials.
Applications
32Oct 13, 2016
• Clearly defined business problem needs to be broken down to a clearly defined
information extraction problem
• Requires combined efforts from business decision makers, domain experts,
natural language processing experts and technical staff
• Data modeling is inevitable part of the process, consider it as early as possible
• Create clear annotation guidelines based on real-world examples
• Start with an initial small set of balanced and representative documents
• Plan the evaluation of the results in advance
• Choose appropriate manual annotation tool
• While annotating content check how the quantity influences the performance
• Select the appropriate text analysis approach
• Plan iterations of curation by domain experts followed by revision of the text
analysis approach
• Plan the aspects of continuous adaptation – document quantity, timing,
temporality of the information fed in the model
Take away messages
33Oct 13, 2016
Thank you very much for the attention!
You are welcome to try our demos at http://ontotext.com

Más contenido relacionado

La actualidad más candente

Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text MiningMichel Bruley
 
Text mining and data mining
Text mining and data mining Text mining and data mining
Text mining and data mining Bhawi247
 
SA2: Text Mining from User Generated Content
SA2: Text Mining from User Generated ContentSA2: Text Mining from User Generated Content
SA2: Text Mining from User Generated ContentJohn Breslin
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extractionguest0edcaf
 
Large-Scale Semantic Search
Large-Scale Semantic SearchLarge-Scale Semantic Search
Large-Scale Semantic SearchRoi Blanco
 
II-SDV 2017: Semantic Search Jargon - A short Guide
II-SDV 2017: Semantic Search Jargon - A short GuideII-SDV 2017: Semantic Search Jargon - A short Guide
II-SDV 2017: Semantic Search Jargon - A short GuideDr. Haxel Consult
 
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Machine Learning Prague
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic SearchPaul Wlodarczyk
 
Quick Tour of Text Mining
Quick Tour of Text MiningQuick Tour of Text Mining
Quick Tour of Text MiningYi-Shin Chen
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEDiana Maynard
 
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handoutYi-Shin Chen
 
Information retrieval concept, practice and challenge
Information retrieval   concept, practice and challengeInformation retrieval   concept, practice and challenge
Information retrieval concept, practice and challengeGan Keng Hoon
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text miningKrish_ver2
 

La actualidad más candente (20)

Text mining
Text miningText mining
Text mining
 
Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text Mining
 
Text mining and data mining
Text mining and data mining Text mining and data mining
Text mining and data mining
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
SA2: Text Mining from User Generated Content
SA2: Text Mining from User Generated ContentSA2: Text Mining from User Generated Content
SA2: Text Mining from User Generated Content
 
Tesxt mining
Tesxt miningTesxt mining
Tesxt mining
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extraction
 
Text mining
Text miningText mining
Text mining
 
Large-Scale Semantic Search
Large-Scale Semantic SearchLarge-Scale Semantic Search
Large-Scale Semantic Search
 
Week12
Week12Week12
Week12
 
Text MIning
Text MIningText MIning
Text MIning
 
II-SDV 2017: Semantic Search Jargon - A short Guide
II-SDV 2017: Semantic Search Jargon - A short GuideII-SDV 2017: Semantic Search Jargon - A short Guide
II-SDV 2017: Semantic Search Jargon - A short Guide
 
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
 
Text mining
Text miningText mining
Text mining
 
Quick Tour of Text Mining
Quick Tour of Text MiningQuick Tour of Text Mining
Quick Tour of Text Mining
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handout
 
Information retrieval concept, practice and challenge
Information retrieval   concept, practice and challengeInformation retrieval   concept, practice and challenge
Information retrieval concept, practice and challenge
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text mining
 

Destacado

Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageOntotext
 
Text data mining1
Text data mining1Text data mining1
Text data mining1KU Leuven
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataOntotext
 
Igor ostuchenko-analytics
Igor ostuchenko-analyticsIgor ostuchenko-analytics
Igor ostuchenko-analyticsSEMonline .Ru
 
Understanding Voice of Members via Text Mining – How Linkedin Built a Text An...
Understanding Voice of Members via Text Mining – How Linkedin Built a Text An...Understanding Voice of Members via Text Mining – How Linkedin Built a Text An...
Understanding Voice of Members via Text Mining – How Linkedin Built a Text An...Chi-Yi Kuan
 
Introducing Compreno - Natural Language Processing Technology
Introducing Compreno - Natural Language Processing TechnologyIntroducing Compreno - Natural Language Processing Technology
Introducing Compreno - Natural Language Processing TechnologyABBYY
 
Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Florian Leitner
 
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Cloudera, Inc.
 
Analyzing Customer Experience Feedback Using Text Mining: A Linguistics-Based...
Analyzing Customer Experience Feedback Using Text Mining: A Linguistics-Based...Analyzing Customer Experience Feedback Using Text Mining: A Linguistics-Based...
Analyzing Customer Experience Feedback Using Text Mining: A Linguistics-Based...Mohamed Zaki
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text MiningHemant Sharma
 
WhosOn live Chat - Analytics, Interface Design &amp; CRM Intergration
WhosOn live Chat - Analytics, Interface Design &amp; CRM IntergrationWhosOn live Chat - Analytics, Interface Design &amp; CRM Intergration
WhosOn live Chat - Analytics, Interface Design &amp; CRM Intergrationianrowley
 
What’s attractive in Rakuten Technology Conference 2016. (English Version)
What’s attractive in Rakuten Technology Conference 2016. (English Version)What’s attractive in Rakuten Technology Conference 2016. (English Version)
What’s attractive in Rakuten Technology Conference 2016. (English Version)Rakuten Group, Inc.
 
[2014년 3월 25일] mining minds 빅 데이터, 욕망을 읽다
[2014년 3월 25일] mining minds   빅 데이터, 욕망을 읽다[2014년 3월 25일] mining minds   빅 데이터, 욕망을 읽다
[2014년 3월 25일] mining minds 빅 데이터, 욕망을 읽다gilforum
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaDiana Maynard
 
Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Mechanical Turk
 
Kth daisy 추천솔루션_20130509_v1.0_이호철
Kth daisy 추천솔루션_20130509_v1.0_이호철Kth daisy 추천솔루션_20130509_v1.0_이호철
Kth daisy 추천솔루션_20130509_v1.0_이호철HoChul Lee
 

Destacado (20)

Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Igor ostuchenko-analytics
Igor ostuchenko-analyticsIgor ostuchenko-analytics
Igor ostuchenko-analytics
 
Understanding Voice of Members via Text Mining – How Linkedin Built a Text An...
Understanding Voice of Members via Text Mining – How Linkedin Built a Text An...Understanding Voice of Members via Text Mining – How Linkedin Built a Text An...
Understanding Voice of Members via Text Mining – How Linkedin Built a Text An...
 
Introducing Compreno - Natural Language Processing Technology
Introducing Compreno - Natural Language Processing TechnologyIntroducing Compreno - Natural Language Processing Technology
Introducing Compreno - Natural Language Processing Technology
 
Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)
 
Big data concept
Big data conceptBig data concept
Big data concept
 
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
 
Analyzing Customer Experience Feedback Using Text Mining: A Linguistics-Based...
Analyzing Customer Experience Feedback Using Text Mining: A Linguistics-Based...Analyzing Customer Experience Feedback Using Text Mining: A Linguistics-Based...
Analyzing Customer Experience Feedback Using Text Mining: A Linguistics-Based...
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
 
WhosOn live Chat - Analytics, Interface Design &amp; CRM Intergration
WhosOn live Chat - Analytics, Interface Design &amp; CRM IntergrationWhosOn live Chat - Analytics, Interface Design &amp; CRM Intergration
WhosOn live Chat - Analytics, Interface Design &amp; CRM Intergration
 
What’s attractive in Rakuten Technology Conference 2016. (English Version)
What’s attractive in Rakuten Technology Conference 2016. (English Version)What’s attractive in Rakuten Technology Conference 2016. (English Version)
What’s attractive in Rakuten Technology Conference 2016. (English Version)
 
Amazon Deep Learning
Amazon Deep LearningAmazon Deep Learning
Amazon Deep Learning
 
[2014년 3월 25일] mining minds 빅 데이터, 욕망을 읽다
[2014년 3월 25일] mining minds   빅 데이터, 욕망을 읽다[2014년 3월 25일] mining minds   빅 데이터, 욕망을 읽다
[2014년 3월 25일] mining minds 빅 데이터, 욕망을 읽다
 
Data Mining Overview
Data Mining OverviewData Mining Overview
Data Mining Overview
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social media
 
Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar Best Practices for Sentiment Analysis Webinar
Best Practices for Sentiment Analysis Webinar
 
Kth daisy 추천솔루션_20130509_v1.0_이호철
Kth daisy 추천솔루션_20130509_v1.0_이호철Kth daisy 추천솔루션_20130509_v1.0_이호철
Kth daisy 추천솔루션_20130509_v1.0_이호철
 

Similar a Best Practices for Large Scale Text Mining Processing

Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologiesenterprisesearchmeetup
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise SearchFindwise
 
Optimising Your Content for findability
Optimising Your Content for findabilityOptimising Your Content for findability
Optimising Your Content for findabilityKristian Norling
 
Embedding Clinical standards in research workshop
Embedding Clinical standards in research workshopEmbedding Clinical standards in research workshop
Embedding Clinical standards in research workshopJames Malone
 
ProjectsSummary.pptx
ProjectsSummary.pptxProjectsSummary.pptx
ProjectsSummary.pptxJamesKirk79
 
Cloud computing in qualitative research data analysis with support of web qda...
Cloud computing in qualitative research data analysis with support of web qda...Cloud computing in qualitative research data analysis with support of web qda...
Cloud computing in qualitative research data analysis with support of web qda...German Jordanian university
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptxXanGwaps
 
Understanding big data and data analytics-Business Intelligence
Understanding big data and data analytics-Business IntelligenceUnderstanding big data and data analytics-Business Intelligence
Understanding big data and data analytics-Business IntelligenceSeta Wicaksana
 
Text mining why people need to be part of the process
Text mining   why people need to be part of the processText mining   why people need to be part of the process
Text mining why people need to be part of the processPhilo Janus
 
Tutorial on query auto completion
Tutorial on query auto completionTutorial on query auto completion
Tutorial on query auto completionYichen Feng
 
Tutorial on query auto-completion
Tutorial on query auto-completionTutorial on query auto-completion
Tutorial on query auto-completionYichen Feng
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation SystemsSalil Navgire
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data sciencebhavesh lande
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsRebecca Grant
 
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...Robert Williams
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisCrowdFlower
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)SayyedYusufali
 

Similar a Best Practices for Large Scale Text Mining Processing (20)

Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise Search
 
Optimising Your Content for findability
Optimising Your Content for findabilityOptimising Your Content for findability
Optimising Your Content for findability
 
Embedding Clinical standards in research workshop
Embedding Clinical standards in research workshopEmbedding Clinical standards in research workshop
Embedding Clinical standards in research workshop
 
Solved Big Data and Data Science Projects pdf.pdf
Solved Big Data and Data Science Projects pdf.pdfSolved Big Data and Data Science Projects pdf.pdf
Solved Big Data and Data Science Projects pdf.pdf
 
ProjectsSummary.pptx
ProjectsSummary.pptxProjectsSummary.pptx
ProjectsSummary.pptx
 
Cloud computing in qualitative research data analysis with support of web qda...
Cloud computing in qualitative research data analysis with support of web qda...Cloud computing in qualitative research data analysis with support of web qda...
Cloud computing in qualitative research data analysis with support of web qda...
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
Understanding big data and data analytics-Business Intelligence
Understanding big data and data analytics-Business IntelligenceUnderstanding big data and data analytics-Business Intelligence
Understanding big data and data analytics-Business Intelligence
 
Ch 3
Ch   3Ch   3
Ch 3
 
Text mining why people need to be part of the process
Text mining   why people need to be part of the processText mining   why people need to be part of the process
Text mining why people need to be part of the process
 
Tutorial on query auto completion
Tutorial on query auto completionTutorial on query auto completion
Tutorial on query auto completion
 
Tutorial on query auto-completion
Tutorial on query auto-completionTutorial on query auto-completion
Tutorial on query auto-completion
 
Data mining
Data miningData mining
Data mining
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation Systems
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research Methods
 
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 

Más de Ontotext

Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Ontotext
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
 
Building Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsBuilding Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsOntotext
 
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingAnalytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingOntotext
 
It Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsIt Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsOntotext
 
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise Ontotext
 
[Webinar] GraphDB Fundamentals: Adding Meaning to Your Data
[Webinar] GraphDB Fundamentals: Adding Meaning to Your Data[Webinar] GraphDB Fundamentals: Adding Meaning to Your Data
[Webinar] GraphDB Fundamentals: Adding Meaning to Your DataOntotext
 
[Conference] Cognitive Graph Analytics on Company Data and News
[Conference] Cognitive Graph Analytics on Company Data and News[Conference] Cognitive Graph Analytics on Company Data and News
[Conference] Cognitive Graph Analytics on Company Data and NewsOntotext
 
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Ontotext
 
Hercule: Journalist Platform to Find Breaking News and Fight Fake Ones
Hercule: Journalist Platform to Find Breaking News and Fight Fake OnesHercule: Journalist Platform to Find Breaking News and Fight Fake Ones
Hercule: Journalist Platform to Find Breaking News and Fight Fake OnesOntotext
 
How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps Ontotext
 
GraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on DemandGraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on DemandOntotext
 
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...Ontotext
 
Smarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing PlatformSmarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing PlatformOntotext
 
How is smart data cooked?
How is smart data cooked?How is smart data cooked?
How is smart data cooked?Ontotext
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
 
The Knowledge Discovery Quest
The Knowledge Discovery Quest The Knowledge Discovery Quest
The Knowledge Discovery Quest Ontotext
 
Gain Super Powers in Data Science: Relationship Discovery Across Public Data
Gain Super Powers in Data Science: Relationship Discovery Across Public DataGain Super Powers in Data Science: Relationship Discovery Across Public Data
Gain Super Powers in Data Science: Relationship Discovery Across Public DataOntotext
 
Gaining Advantage in e-Learning with Semantic Adaptive Technology
Gaining Advantage in e-Learning with Semantic Adaptive TechnologyGaining Advantage in e-Learning with Semantic Adaptive Technology
Gaining Advantage in e-Learning with Semantic Adaptive TechnologyOntotext
 
Cooking up the Semantic Web
Cooking up the Semantic WebCooking up the Semantic Web
Cooking up the Semantic WebOntotext
 

Más de Ontotext (20)

Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
Building Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsBuilding Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 steps
 
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingAnalytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
 
It Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsIt Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got Semantics
 
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
 
[Webinar] GraphDB Fundamentals: Adding Meaning to Your Data
[Webinar] GraphDB Fundamentals: Adding Meaning to Your Data[Webinar] GraphDB Fundamentals: Adding Meaning to Your Data
[Webinar] GraphDB Fundamentals: Adding Meaning to Your Data
 
[Conference] Cognitive Graph Analytics on Company Data and News
[Conference] Cognitive Graph Analytics on Company Data and News[Conference] Cognitive Graph Analytics on Company Data and News
[Conference] Cognitive Graph Analytics on Company Data and News
 
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
 
Hercule: Journalist Platform to Find Breaking News and Fight Fake Ones
Hercule: Journalist Platform to Find Breaking News and Fight Fake OnesHercule: Journalist Platform to Find Breaking News and Fight Fake Ones
Hercule: Journalist Platform to Find Breaking News and Fight Fake Ones
 
How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps
 
GraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on DemandGraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on Demand
 
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
 
Smarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing PlatformSmarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing Platform
 
How is smart data cooked?
How is smart data cooked?How is smart data cooked?
How is smart data cooked?
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
The Knowledge Discovery Quest
The Knowledge Discovery Quest The Knowledge Discovery Quest
The Knowledge Discovery Quest
 
Gain Super Powers in Data Science: Relationship Discovery Across Public Data
Gain Super Powers in Data Science: Relationship Discovery Across Public DataGain Super Powers in Data Science: Relationship Discovery Across Public Data
Gain Super Powers in Data Science: Relationship Discovery Across Public Data
 
Gaining Advantage in e-Learning with Semantic Adaptive Technology
Gaining Advantage in e-Learning with Semantic Adaptive TechnologyGaining Advantage in e-Learning with Semantic Adaptive Technology
Gaining Advantage in e-Learning with Semantic Adaptive Technology
 
Cooking up the Semantic Web
Cooking up the Semantic WebCooking up the Semantic Web
Cooking up the Semantic Web
 

Último

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 

Best Practices for Large Scale Text Mining Processing

  • 1. Oct 13, 2016 Ivelina Nikolova Senior NLP Engineer Best Practices for Large Scale Text Mining Process
  • 2. 2Oct 13, 2016 In this webinar you will learn … • Industry applications that maximize Return on Investment (ROI) of your text mining process • To describe your text mining problem • To define the output of the text mining • To select the appropriate text analysis techniques • To plan the prerequisites for a successful text mining solution • DOs and DON’Ts in setting up a text mining process.
  • 3. 3Oct 13, 2016 Outline • Business need for text mining solutions • Introduction to NLP and information extraction • How to tailor your text analysis process • Applications and demonstrations
  • 4. 4Oct 13, 2016 • Links mentions in the text to knowledge base concepts • Automatic, manual and semi-automatic Semantic annotation/enrichment
  • 5. 5Oct 13, 2016 • Semantic annotation facilitates: – data search – data management – data understanding – and more abstract modeling of the textual content like… Business needs for text mining solutions
  • 6. 6Oct 13, 2016 • Text summarization • Content recommendation • Document classification • Topic extraction • Document search and retrieval • Question answering • Sentiment analysis Business needs for text mining solutions
  • 7. 7Oct 13, 2016 Some of our customers
  • 8. 8Oct 13, 2016 • Computational Linguistics (CS) • Natural Language Processing (NLP) • Text Mining (TM) • Information Extraction (IE) • Named Entity Recognition (NER) NLP and IE
  • 9. 9Oct 13, 2016 • Named Entity Recognition – 60% F1 [OKE-challenge@ESWC2015] – 82.9% F1 [Leaman and Lu, 2016] in the biomedical domain – above 90% for more specific tasks State-of-the art
  • 10. 10Oct 13, 2016 • Language and domain dependent • The input is free text “President Barack Obama labels Donald Trump comments as 'disturbing'” “Barack Obama labels Donald Trump comments as 'disturbing'” “President Obama labels Donald Trump comments as 'disturbing'” • Natural language ambiguity I cleaned the dishes in my pajamas. I cleaned the dishes in the sink. Georgia was happy with the meal, her boyfriend cooked. Maria is excited about her trip to Georgia next month. Why is NLP so hard?
  • 11. 11Oct 13, 2016 Designing the text mining process • Know your business problem • Know your data • Find appropriate samples • Use common formats or formats which can be easily transformed to such • Get together domain experts, technical staff, NLP engineers and potential users • Narrow the business problem to information extraction task • Clear the annotation types • Clear the annotation guidelines • Apply the appropriate algorithm for IE • Do iterations of evaluation and improvement • Insure continuous adaptation by curation and re-training
  • 13. 13Oct 13, 2016 Clear problem definition • Define clearly your business problem • specific smart search • content recommendation • content enrichment • content aggregation etc. E.g. the system must do <A, B, C> • Define clearly the text analysis problem • Reduce the business problem to information extraction problem Business problem: faceted search by Persons, Organizations, Locations Information extraction problem: extract mentions of Persons, Organizations, Locations and link them to the corresponding concepts in the knowledge base
  • 14. 14Oct 13, 2016 • Annotations – abstract descriptions of the mentions of concepts of interest Named entities: Person, Location, Organization Disease, Symptom, Chemical SpaceObject, SpaceCraf Relations: PersonHasRoleInOrganisation, Causation Define the annotation types I
  • 15. 15Oct 13, 2016 • Annotation types • Person, Organization, Location • Person, Organization, City • Person, Organization, City, Country • Annotation features Location: string, geonames instance, latitude, longitude Define the annotation types II
  • 16. 16Oct 13, 2016 Locations mentioned Holocaust documents
  • 17. 17Oct 13, 2016 • Annotation types • Person, Organization, Location • Person, Organization, City • Person, Organization, City, Country • Annotation features Location: string, geonames instance, latitude, longitude Chemical: string, inChi, SMILES, CAS PersonHasRoleInOrganization: person instance, role instance, organization instance, timestamp Define the annotation types II string: the Gulf of Mexico startOffset: 71 endOffset: 89 type: Location inst: http://ontology.ontotext.com/resource/tsk7b61yf5ds links: [http://sws.geonames.org/3523271/ http://dbpedia.org/resource/Gulf_of_Mexico] latitude:25.368611 longitude:-90.390556
  • 18. 18Oct 13, 2016 • Realistic • Demonstrating the desired output • Positive and negative • “It therefore increases insulin secretion and reduces POS[glucose] levels, especially postprandially.” • “It acts by increasing POS[NEG[glucose]-induced insulin] release and by reducing glucagon secretion postprandially.” • Representative and balanced set of the types of problems • In appropriate/commonly used format – XML, HTML, TXT, CSV, DOC, PDF. Provide examples
  • 19. 19Oct 13, 2016 Domain model and knowledge • Domain model/ontology - describes the types of objects in the problem area and the relations between them
  • 20. 20Oct 13, 2016 • Data sources - proprietary data, public data, professional data • Data cleanup • Data formats • Data stores • For metadata - GraphDB (http://ontotext.com/graphdb/) • For content – MongoDB, MarkLogic etc. • Data modeling is inevitable part of the process of semantic data enrichment • Start it as early as possible • Keep to the common data formats • Mistakes and underestimations are expensive because they influence the whole process of developing a text mining solution Data
  • 21. 21Oct 13, 2016 • Gold standard – annotated data with superior quality • Annotation guidelines - used as guidance for manually annotating the documents. POS[London] universities = universities located in London NEG[London] City Council NEG[London] Mayor • Manual annotation tools – intuitive UI, visualization features, export formats • MANT – Ontotext's in-house tool • GATE – http://gate.ac.uk/ and https://gate.ac.uk/teamware/ • Brad - http://brat.nlplab.org/ • Annotation approach • Manual vs. semi-automatic • Domain experts vs. crowd annotation • E.g. Mechanical Turk - https://www.mturk.com/ • Inter-annotator agreement • Train:Test ratio – 60:40, 70:30 Gold standard
  • 22. 22Oct 13, 2016 • Rule-based approach • lower number of clear patterns which do not change over time or slightly change • high precision • appropriate for domains where it is important to know how the decision for extracting given annotation is taken – e.g. bio-medical domain • Machine learning approach • higher number of patterns which do change over time • requires annotated data • allows for retraining over time • Neural Network approach • Deep Neural Networks - getting closer to AI • Recent advances promise true natural language understanding via complex neural networks • Great results in Speech recognition, Image recognition and Machine translation; breakthrough expected in NLP • Still unclear why and how it works thus difficult to optimize Text analysis approach
  • 23. 23Oct 13, 2016 • Preprocessing • Keyphrase extraction • Gazetteer based enrichment • Named entity recognition and disambiguation • Generic entity extraction • Result consolidation • Relation extraction NER Pipeline
  • 24. 24Oct 13, 2016 NER pipeline
  • 25. 25Oct 13, 2016 NER pipeline
  • 26. 26Oct 13, 2016 NER pipeline
  • 27. 27Oct 13, 2016 • Curation of results - domain experts assess manually the work of the text analysis components • Testing interfaces • Feedback • Select representative set of documents to evaluate manually • Provide as full description of the results and the used component as possible:  <pipeline version>  <input as send for processing>  <description of the wrong behavior>  <description of the correct behavior> • The earlier this happens it triggers revision of the models and improvement of the annotation Results curation / Error analysis
  • 28. 28Oct 13, 2016 • Gold standard split train:test • 70:30 • 80:20 • Which task you want to evaluate • E.g. extraction at document level or inline annotation • Evaluation metrics • Information extraction tasks – precision, recall, F-measure • Recommendations – A/B-testing Evaluation of the results
  • 30. 30Oct 13, 2016 • Document categorization • post, political news, sport news, etc.; • Topic extraction • important words and phrases in the text; • Named entity recognition • People, Organization, Location, Time, Amounts of money, etc.; • Keyterm assignment from predefined hierarchies • Concept extraction • entities from a knowledge base; • Relation extraction • relations between types of entities. Types of extracted information
  • 31. 31Oct 13, 2016 • TAG (http://tag.ontotext.com) • NOW (http://now.ontotext.com) • Patient Insights (http://patient.ontotext.com/) - contact todor.primov@ontotext.com for credentials. Applications
  • 32. 32Oct 13, 2016 • Clearly defined business problem needs to be broken down to a clearly defined information extraction problem • Requires combined efforts from business decision makers, domain experts, natural language processing experts and technical staff • Data modeling is inevitable part of the process, consider it as early as possible • Create clear annotation guidelines based on real-world examples • Start with an initial small set of balanced and representative documents • Plan the evaluation of the results in advance • Choose appropriate manual annotation tool • While annotating content check how the quantity influences the performance • Select the appropriate text analysis approach • Plan iterations of curation by domain experts followed by revision of the text analysis approach • Plan the aspects of continuous adaptation – document quantity, timing, temporality of the information fed in the model Take away messages
  • 33. 33Oct 13, 2016 Thank you very much for the attention! You are welcome to try our demos at http://ontotext.com