SlideShare una empresa de Scribd logo
1 de 51
Structured Search Dan McCreary President Dan McCreary & Associates [email_address] (952) 931-9198 Version 4
Presentation Description ,[object Object],[object Object],[object Object]
After This Presentation Users Will Be Able To: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Background for Dan McCreary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How Many People… ,[object Object],[object Object],[object Object],[object Object],[object Object]
Structured Search ,[object Object],[object Object],But first a story……
Information Retrieval Textbook ,[object Object],[object Object],[object Object],http://nlp.stanford.edu/IR-book/information-retrieval-book.html
117 Citations in Computer Science With 117 citations, the "Intro to IR" book is the second most cited Computer Science reference published in 2008.
Table 10.1 XML - Table 10.1 and structured information retrieval.  SQLRDB (relational database) search, unstructured information retrieval   RDB search unstructured retrieval structured retrieval objects records unstructured documents trees with text at leaves model relational model vector space & others ? main data structure table inverted index ? queries SQL free text queries ?
Excerpt from IR Book… ,[object Object],[object Object],[object Object]
eXist Native XML Developers eXist Meeting Prague March 12 th , 2010
Presentation Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Table 10.1 - Revised XML - Table 10.1 and structured information retrieval.  SQLRDB (relational database) search, unstructured information retrieval   RDB search unstructured retrieval structured retrieval objects records unstructured documents trees with text at leaves model relational model vector space & others XML hierarchy main data structure table inverted index trees with node-ids for document ids queries SQL free text queries XQuery fulltext
Relational DB Boolean Search ,[object Object],[object Object],[object Object],SELECT * FROM PERSON WHERE TITLE = 'manager' ORDER BY SALARY Note that the "order" is not the quality of a mach but another column in a table
Vector Model ,[object Object],[object Object],Your search keyword (green) Other documents (blue) Search score is distance measurement (red) Keyword 1 Keyword 2
Reverse Index For each word, a reverse index tells you what documents contain that word.  Word Document IDs hate 12344, 34235, 43513,  love 12344, 34235, 43513, 22345, 12313, 42345, 12313, 13124
Reverse Index in eXist 1.5 Terms that start with "love"
Sample Keyword Search ,[object Object],[object Object],Keyword Search: Resulting Hits: Code (XQuery):
Calculating Score ,[object Object],[object Object],[object Object],[object Object],[object Object]
How is "Structured Search" Different? ,[object Object],[object Object],[object Object]
Two Models ,[object Object],[object Object],[object Object],[object Object],[object Object],'love' 'hate' 'new' 'fear' keywords keywords keywords keywords keywords keywords doc-id
Keywords and Node IDs ,[object Object],Node-id Node-id Node-id Node-id Node-id Node-id keywords keywords keywords keywords keywords keywords document-id
Subdocuments ,[object Object],[object Object],[object Object],[object Object]
Books Have Structure Book Title Book Metadata
Presentations Have Structure Find all slides with the word "XML" in their title ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
E-mail has structure ,[object Object],[object Object],[object Object],[object Object]
Sample of XML ,[object Object]
Many Objects Have Structure Spreadsheets Find all forms with a label of "Zipcode" Find all spreadsheets with a first row cell that contains the word "SSN"
But What About Microsoft Office? ,[object Object],Office Open XML (also informally known as OOXML or OpenXML) is a zipped, XML-based file format developed by Microsoft  for representing spreadsheets, charts, presentations  and word processing documents.  File extensions: .docx, .xlsx, .pptx are zipped folders that contain XML files  ECMA-376
Open Document XML Formats ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Benefits ,[object Object],[object Object],[object Object],Target documents Other documents Actual Search Results  Missed Document
Results from Studies ,[object Object],[object Object],[object Object],[object Object],Source: INEX 2003/2004 "Bag-of-words" vs. "full structure"
Tibetan Buddhist Resource Center (TBRC) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],1 (low) 5 (high)
Woodruff Library, Emory University ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],1 (low) 5 (high)
Challenges ,[object Object],[object Object],[object Object],[object Object]
Getting Data into XML ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What to Return in a Hit ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Allow searchers to specify level with search options ,[object Object],[object Object],[object Object],[object Object],[object Object]
Steps in Testing Structured Search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Steps in Structured Search Project ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sample Queries ,[object Object],Find all "SPEECH" elements that contains the keyword 'love' (predicate or "WHERE" clause)
Near Operator ,[object Object],[object Object]
Skillsets Needed for Pilot Project ,[object Object],[object Object],[object Object],[object Object]
Predictions ,[object Object],[object Object],[object Object],[object Object],[object Object]
Steps to Run Examples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sample Configuration File ,[object Object],Example of boost on title
XQuery Fulltext
XQuery/Lucene Search Wikibook
Acknowledgements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
References ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Send e-mail to dan@danmccreary.com for extended list of "getting started" resources.
Questions? Dan McCreary President Dan McCreary & Associates [email_address] (952) 931-9198

Más contenido relacionado

La actualidad más candente

Conclusions - Linked Data
Conclusions - Linked DataConclusions - Linked Data
Conclusions - Linked DataJuan Sequeda
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jijtsrd
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Trey Grainger
 
One Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACLOne Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACLConnected Data World
 
How Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesHow Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesDATAVERSITY
 
What Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS LibraryWhat Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS LibraryNeo4j
 
Searching for Meaning
Searching for MeaningSearching for Meaning
Searching for MeaningTrey Grainger
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data ManagementeXascale Infolab
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge GraphLukas Masuch
 
4. Document Discovery with Graph Data Science
 4. Document Discovery with Graph Data Science 4. Document Discovery with Graph Data Science
4. Document Discovery with Graph Data ScienceNeo4j
 
Graph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysisGraph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysisNeo4j
 
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistEthics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistStratos Kontopoulos
 
TehranDB Meet-up April 2018 Introduction to Graph Database
TehranDB Meet-up April 2018 Introduction to Graph DatabaseTehranDB Meet-up April 2018 Introduction to Graph Database
TehranDB Meet-up April 2018 Introduction to Graph DatabaseHamoon Mohammadian Pour
 
SKOS and Linked Data
SKOS and Linked DataSKOS and Linked Data
SKOS and Linked DataAntoine Isaac
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphsSören Auer
 
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
 
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise Ontotext
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementTrey Grainger
 

La actualidad más candente (20)

Conclusions - Linked Data
Conclusions - Linked DataConclusions - Linked Data
Conclusions - Linked Data
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4j
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)
 
One Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACLOne Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACL
 
How Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesHow Semantics Solves Big Data Challenges
How Semantics Solves Big Data Challenges
 
What Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS LibraryWhat Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS Library
 
Searching for Meaning
Searching for MeaningSearching for Meaning
Searching for Meaning
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge Graph
 
4. Document Discovery with Graph Data Science
 4. Document Discovery with Graph Data Science 4. Document Discovery with Graph Data Science
4. Document Discovery with Graph Data Science
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Graph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysisGraph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysis
 
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistEthics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
 
TehranDB Meet-up April 2018 Introduction to Graph Database
TehranDB Meet-up April 2018 Introduction to Graph DatabaseTehranDB Meet-up April 2018 Introduction to Graph Database
TehranDB Meet-up April 2018 Introduction to Graph Database
 
SKOS and Linked Data
SKOS and Linked DataSKOS and Linked Data
SKOS and Linked Data
 
Graph db
Graph dbGraph db
Graph db
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data Smarter
 
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 

Destacado

Cookies1 passwords
Cookies1 passwordsCookies1 passwords
Cookies1 passwordssmgibbs
 
2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & AcronymsBrian Johnson
 
Interactive Query and Search for your Big Data
Interactive Query and Search for your Big DataInteractive Query and Search for your Big Data
Interactive Query and Search for your Big DataDataWorks Summit
 
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)weiw_oz
 
PayrollAdmin - Attendance and Payroll Management ERP Software
PayrollAdmin - Attendance and Payroll Management ERP SoftwarePayrollAdmin - Attendance and Payroll Management ERP Software
PayrollAdmin - Attendance and Payroll Management ERP SoftwareRanganath Shivaram
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithmRupali Bhatnagar
 
Naive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsNaive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsDKALab
 
E-Learning Baseline, UCL
E-Learning Baseline, UCLE-Learning Baseline, UCL
E-Learning Baseline, UCLJisc
 
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...
The eBay Architecture:  Striking a Balance between Site Stability, Feature Ve...The eBay Architecture:  Striking a Balance between Site Stability, Feature Ve...
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...Randy Shoup
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysisM. Atif Qureshi
 
Hadoop World 2011 Keynote: Ebay - Hugh Williams
Hadoop World 2011 Keynote: Ebay - Hugh WilliamsHadoop World 2011 Keynote: Ebay - Hugh Williams
Hadoop World 2011 Keynote: Ebay - Hugh WilliamsCloudera, Inc.
 
eBay Architecture
eBay Architecture eBay Architecture
eBay Architecture Tony Ng
 
HBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay SearchHBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay SearchCloudera, Inc.
 
Keyword proximity search in xml trees andrada astefanoaie - presentation
Keyword proximity search in xml trees   andrada astefanoaie - presentationKeyword proximity search in xml trees   andrada astefanoaie - presentation
Keyword proximity search in xml trees andrada astefanoaie - presentationAndrada Astefanoaie
 

Destacado (16)

Cookies1 passwords
Cookies1 passwordsCookies1 passwords
Cookies1 passwords
 
2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms
 
Ebay search
Ebay searchEbay search
Ebay search
 
Presentation
PresentationPresentation
Presentation
 
Interactive Query and Search for your Big Data
Interactive Query and Search for your Big DataInteractive Query and Search for your Big Data
Interactive Query and Search for your Big Data
 
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
 
PayrollAdmin - Attendance and Payroll Management ERP Software
PayrollAdmin - Attendance and Payroll Management ERP SoftwarePayrollAdmin - Attendance and Payroll Management ERP Software
PayrollAdmin - Attendance and Payroll Management ERP Software
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithm
 
Naive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsNaive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event Models
 
E-Learning Baseline, UCL
E-Learning Baseline, UCLE-Learning Baseline, UCL
E-Learning Baseline, UCL
 
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...
The eBay Architecture:  Striking a Balance between Site Stability, Feature Ve...The eBay Architecture:  Striking a Balance between Site Stability, Feature Ve...
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
Hadoop World 2011 Keynote: Ebay - Hugh Williams
Hadoop World 2011 Keynote: Ebay - Hugh WilliamsHadoop World 2011 Keynote: Ebay - Hugh Williams
Hadoop World 2011 Keynote: Ebay - Hugh Williams
 
eBay Architecture
eBay Architecture eBay Architecture
eBay Architecture
 
HBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay SearchHBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay Search
 
Keyword proximity search in xml trees andrada astefanoaie - presentation
Keyword proximity search in xml trees   andrada astefanoaie - presentationKeyword proximity search in xml trees   andrada astefanoaie - presentation
Keyword proximity search in xml trees andrada astefanoaie - presentation
 

Similar a Structured Document Search and Retrieval

You Don't Know SEO
You Don't Know SEOYou Don't Know SEO
You Don't Know SEOMichael King
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search EnginesNitin Pande
 
Being RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceBeing RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceDavid Hoerster
 
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search EngineBusiness Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engineankur881120
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
SharePoint Jumpstart #2 Making Basic SharePoint Search Work
SharePoint Jumpstart #2 Making Basic SharePoint Search WorkSharePoint Jumpstart #2 Making Basic SharePoint Search Work
SharePoint Jumpstart #2 Making Basic SharePoint Search WorkEarley Information Science
 
Demystifying analytics in e discovery white paper 06-30-14
Demystifying analytics in e discovery   white paper 06-30-14Demystifying analytics in e discovery   white paper 06-30-14
Demystifying analytics in e discovery white paper 06-30-14Steven Toole
 
Introduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsIntroduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsData Works MD
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2Sara Hooker
 
Using metadata repositories with search
Using metadata repositories with searchUsing metadata repositories with search
Using metadata repositories with searchJean Graef
 
Search for Clarify/Dovetail
Search for Clarify/DovetailSearch for Clarify/Dovetail
Search for Clarify/DovetailGary Sherman
 
Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customersrichwig
 
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”voginip
 
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”VOGIN-academie
 
Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Christopher Biow
 
Making things findable
Making things findableMaking things findable
Making things findablePeter Mika
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search SystemTrey Grainger
 

Similar a Structured Document Search and Retrieval (20)

You Don't Know SEO
You Don't Know SEOYou Don't Know SEO
You Don't Know SEO
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search Engines
 
Document repositories-and-metadata
Document repositories-and-metadataDocument repositories-and-metadata
Document repositories-and-metadata
 
Being RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceBeing RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data Persistence
 
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search EngineBusiness Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engine
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
SharePoint Jumpstart #2 Making Basic SharePoint Search Work
SharePoint Jumpstart #2 Making Basic SharePoint Search WorkSharePoint Jumpstart #2 Making Basic SharePoint Search Work
SharePoint Jumpstart #2 Making Basic SharePoint Search Work
 
Sweo talk
Sweo talkSweo talk
Sweo talk
 
Demystifying analytics in e discovery white paper 06-30-14
Demystifying analytics in e discovery   white paper 06-30-14Demystifying analytics in e discovery   white paper 06-30-14
Demystifying analytics in e discovery white paper 06-30-14
 
Introduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsIntroduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application Insights
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
Using metadata repositories with search
Using metadata repositories with searchUsing metadata repositories with search
Using metadata repositories with search
 
Search for Clarify/Dovetail
Search for Clarify/DovetailSearch for Clarify/Dovetail
Search for Clarify/Dovetail
 
Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customers
 
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
 
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
 
Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010
 
Making things findable
Making things findableMaking things findable
Making things findable
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
 

Más de Optum

Building Bi Dashboards With Sas
Building Bi Dashboards With SasBuilding Bi Dashboards With Sas
Building Bi Dashboards With SasOptum
 
An Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEMAn Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEMOptum
 
Promoting the Semantic Web
Promoting the Semantic WebPromoting the Semantic Web
Promoting the Semantic WebOptum
 
Patterns of Semantic Integration
Patterns of Semantic IntegrationPatterns of Semantic Integration
Patterns of Semantic IntegrationOptum
 
Semantics In Declarative Systems
Semantics In Declarative SystemsSemantics In Declarative Systems
Semantics In Declarative SystemsOptum
 
XRX Presentation to Minnesota OTUG
XRX Presentation to Minnesota OTUGXRX Presentation to Minnesota OTUG
XRX Presentation to Minnesota OTUGOptum
 

Más de Optum (6)

Building Bi Dashboards With Sas
Building Bi Dashboards With SasBuilding Bi Dashboards With Sas
Building Bi Dashboards With Sas
 
An Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEMAn Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEM
 
Promoting the Semantic Web
Promoting the Semantic WebPromoting the Semantic Web
Promoting the Semantic Web
 
Patterns of Semantic Integration
Patterns of Semantic IntegrationPatterns of Semantic Integration
Patterns of Semantic Integration
 
Semantics In Declarative Systems
Semantics In Declarative SystemsSemantics In Declarative Systems
Semantics In Declarative Systems
 
XRX Presentation to Minnesota OTUG
XRX Presentation to Minnesota OTUGXRX Presentation to Minnesota OTUG
XRX Presentation to Minnesota OTUG
 

Último

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Structured Document Search and Retrieval

  • 1. Structured Search Dan McCreary President Dan McCreary & Associates [email_address] (952) 931-9198 Version 4
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. 117 Citations in Computer Science With 117 citations, the "Intro to IR" book is the second most cited Computer Science reference published in 2008.
  • 9. Table 10.1 XML - Table 10.1 and structured information retrieval. SQLRDB (relational database) search, unstructured information retrieval   RDB search unstructured retrieval structured retrieval objects records unstructured documents trees with text at leaves model relational model vector space & others ? main data structure table inverted index ? queries SQL free text queries ?
  • 10.
  • 11. eXist Native XML Developers eXist Meeting Prague March 12 th , 2010
  • 12.
  • 13. Table 10.1 - Revised XML - Table 10.1 and structured information retrieval. SQLRDB (relational database) search, unstructured information retrieval   RDB search unstructured retrieval structured retrieval objects records unstructured documents trees with text at leaves model relational model vector space & others XML hierarchy main data structure table inverted index trees with node-ids for document ids queries SQL free text queries XQuery fulltext
  • 14.
  • 15.
  • 16. Reverse Index For each word, a reverse index tells you what documents contain that word. Word Document IDs hate 12344, 34235, 43513, love 12344, 34235, 43513, 22345, 12313, 42345, 12313, 13124
  • 17. Reverse Index in eXist 1.5 Terms that start with "love"
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Books Have Structure Book Title Book Metadata
  • 25.
  • 26.
  • 27.
  • 28. Many Objects Have Structure Spreadsheets Find all forms with a label of "Zipcode" Find all spreadsheets with a first row cell that contains the word "SSN"
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 49.
  • 50.
  • 51. Questions? Dan McCreary President Dan McCreary & Associates [email_address] (952) 931-9198