SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Challenges for industrial-strength
Information Retrieval on Databases
R. Cornacchia, M. Hildebrand, A.P. de Vries, F. Dorssers
KARS2017 - 21 March 2017, Venice, IT
○ Since 2010
○ Spin-off of CWI, Amsterdam
○ “Search by Strategy”
About Spinque
Outline
1. Search is everywhere
2. Tailored search is expected
3. Tailored search needs modelling
4. Search modelling by information specialists
5. Search modelling needs flexible IR & DB
6. IR on DB: it works
Search is everywhere
Real world scenarios
Technical
Desktop
Coding content assistant
Product recommendation
Personalised
newsfeed
Let’s pick a simple one: autocompletion
iphone 7
iphone 5c
iphone 6s
ipho| “autocompletion is trivial”
.. not so fast!
Tailored search is expected
autocompletion
iphone 7
iphone 5c
iphone 6s
ipho|
Basic - products
○ Any matching term from the index
○ Suggest products
Tailored search is expected
autocompletion
iphone 7
iphone 5c
iphone 6 cases
ipho|
Basic - products & categories
○ Any matching term from the index
○ Suggest products & categories
Tailored search is expected
autocompletion
iphone 7
iphone 6 cases
iphone 6s
ipho|
Filtered
○ Any matching term from the index
○ “iPhone 5c” out of stock
Tailored search is expected
autocompletion
iphone 8
iphone 7
iphone 6 cases
ipho|
Filtered & ranked
○ “iPhone 5c” out of stock
○ “iPhone 8” the most requested
Tailored search is expected
autocompletion
iphone cases
iphone adapters
iphone 7
ipho|
Exploratory
○ First suggest categories..
○ .. then products
Tailored search is expected
autocompletion
iphone 7 cases
iphone 7 adapters
iphone 8
ipho|
Personalised
○ I already own an “iPhone 7”
○ Suggest compatible accessories
○ Suggest upgrade
Tailored search is expected
What if my search API isn’t enough?
Tailored search needs modelling
iphone 7 cases
iphone 7 adapters
iphone 8
ipho|
<your favourite autocompletion>
○ Out-of-the-box API may fall short
○ Build custom search API
○ Who? How?
http://localhost:8983/solr/suggest?q=ipho
How do we build custom search APIs?
Search modelling by information specialists
data modelling search modelling
Spinque: Empower the information specialist
Empowering the information specialist
data modelling search modelling
Search modelling by information specialists
Data modelling
Search modelling needs flexible IR & DB
business transactions social media
Search modelling
standard autocompletion custom autocompletion
Search modelling by information specialists
http://spinque/suggest?q=ipho http://spinque/suggest_ranked?q=ipho
The IR & DB challenge
Search modelling needs flexible IR & DB
○ IR & DB both needed even for trivial tasks
○ Different technologies / focus
○ How / where to integrate task results?
○ Do they stay black boxes?
○ Can we express them in the same platform,
and when does this make sense?
http://spinque/suggest_ranked?q=ipho
Text retrieval by strategy
Search modelling needs flexible IR & DB
text retrieval.. ..is just another DB query
○ strategy-driven “collection” and “documents”
○ on-demand indexing
○ it takes just standard SQL
Graph DB by strategy
Search modelling needs flexible IR & DB
Visual modelling Relational Algebra Graph
subject property object
123 name pen
123 availability in stock
123 price 9.99
Graph DB by strategy
Search modelling needs flexible IR & DB
we want DB & ranking
together & seamlessly
what if this.. ..could work on this?
subject property object p
123 name pen 1.0
123 availability in stock 0.8
123 price 9.99 1.0
Rank. Everything. Always.
Search modelling needs flexible IR & DB
rank products.. ..get ranked orders and customers
Fuhr, Rölleke, 1997, A probabilistic relational algebra for the integration of IR and DB
SELECT g.obj, (o.p * g.p) as p
FROM graph g,
ranked_orders o
WHERE g.subj = o.id
AND g.rel = ’orderedBy’;
PROJECT [$3]
JOIN INDEPENDENT [$1=$1]
SELECT [$2=’orderedBy’] (g)
ranked_orders
SQLPRA
What about efficiency?
IR on DB: it works
1.1M docs, 2.3GB
4-core i7-3770s, 16GB RAM, 256GB SSD
find documents: 20ms
8M lots, 25K auctions (10GB raw data)
VM (8 CPUs) on Xeon E5-2620, 16GB RAM, 256GB SSD
find lots: 150ms
topic
What about efficiency?
IR on DB: it works
pre-compute what can be pre-computed.. ..but do it query-driven
○ Index on demand
○ Cache result of relational expressions
○ Algebraic analysis to determine cache
What about efficiency?
IR on DB: it works
choose it carefully.. ..then enjoy
○ Main benefits of IR on DB
○ IR as a DB optimisation problem
○ No custom extensions, no vendor-lock
○ Column-store, CPU-friendly DB engine
Hey, we made our join 20% faster.
You are welcome.
○ If you just text retrieval on documents
○ Lucene-like will serve you well
○ Information needs tend to be more complex
○ Solve at application-level: common and painful
○ A one-platform approach pays off
IR on DB: when does it make sense?
IR on DB: it works
Conclusions
1. Search is everywhere
○ In the real world..
2. Tailored search is expected
○ ..there is no search like another.
3. Tailored search needs modelling
○ Someone will put effort in it..
4. Search modelling by information specialists
○ ..who better than the right person for the job?
5. Search modelling needs flexible IR & DB
○ Who takes care of the low-level details then?
6. IR on DB: it works
○ The right tools. The right architecture.
○ Live updates
○ ACID transactions overhead
○ Scale out
○ It’s more than “just an inverted file” to be distributed
○ Even better support for information specialists
○ Strategy auto-tuning
Challenges ahead
supporting information specialists
Don’t program search engines,
design them

Más contenido relacionado

La actualidad más candente

International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)ijfcstjournal
 
International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)ijfcstjournal
 
Multi datastores - CLOSER'14
Multi datastores - CLOSER'14Multi datastores - CLOSER'14
Multi datastores - CLOSER'14Marcos Almeida
 
call for papers - 9th International Conference on Natural Language Processing...
call for papers - 9th International Conference on Natural Language Processing...call for papers - 9th International Conference on Natural Language Processing...
call for papers - 9th International Conference on Natural Language Processing...dannyijwest
 
International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)ijfcstjournal
 
Data Reconciliation: Using DBpedia to Enhance your Data
Data Reconciliation: Using DBpedia to Enhance your DataData Reconciliation: Using DBpedia to Enhance your Data
Data Reconciliation: Using DBpedia to Enhance your DataAngus Addlesee
 
User interface – client / portal by Tor Gunnar Øverli
User interface – client / portal by Tor Gunnar ØverliUser interface – client / portal by Tor Gunnar Øverli
User interface – client / portal by Tor Gunnar Øverliplan4business
 

La actualidad más candente (7)

International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)
 
International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)
 
Multi datastores - CLOSER'14
Multi datastores - CLOSER'14Multi datastores - CLOSER'14
Multi datastores - CLOSER'14
 
call for papers - 9th International Conference on Natural Language Processing...
call for papers - 9th International Conference on Natural Language Processing...call for papers - 9th International Conference on Natural Language Processing...
call for papers - 9th International Conference on Natural Language Processing...
 
International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)
 
Data Reconciliation: Using DBpedia to Enhance your Data
Data Reconciliation: Using DBpedia to Enhance your DataData Reconciliation: Using DBpedia to Enhance your Data
Data Reconciliation: Using DBpedia to Enhance your Data
 
User interface – client / portal by Tor Gunnar Øverli
User interface – client / portal by Tor Gunnar ØverliUser interface – client / portal by Tor Gunnar Øverli
User interface – client / portal by Tor Gunnar Øverli
 

Destacado

The ROI of Scaling Agile - How to justify the investment in terms your CFO wi...
The ROI of Scaling Agile - How to justify the investment in terms your CFO wi...The ROI of Scaling Agile - How to justify the investment in terms your CFO wi...
The ROI of Scaling Agile - How to justify the investment in terms your CFO wi...Steve Elliott
 
Large-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSCLarge-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSCAmazon Web Services
 
Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements - ...
Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements - ...Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements - ...
Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements - ...Amazon Web Services
 
Dev ops + ITIL / mejor juntos webinar
Dev ops + ITIL / mejor juntos webinarDev ops + ITIL / mejor juntos webinar
Dev ops + ITIL / mejor juntos webinaritService ®
 
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech TalksHands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech TalksAmazon Web Services
 
Deep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech Talks
Deep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech TalksDeep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech Talks
Deep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech TalksAmazon Web Services
 
Digital Strategy Success 2016
Digital Strategy Success 2016Digital Strategy Success 2016
Digital Strategy Success 2016Dave Chaffey
 
B2B Marketing Automation 2017
B2B Marketing Automation 2017B2B Marketing Automation 2017
B2B Marketing Automation 2017Dave Chaffey
 
Infrastructure Continuous Delivery Using AWS CloudFormation
Infrastructure Continuous Delivery Using AWS CloudFormationInfrastructure Continuous Delivery Using AWS CloudFormation
Infrastructure Continuous Delivery Using AWS CloudFormationAmazon Web Services
 
Developing Applications with the IoT Button - March 2017 AWS Online Tech Talks
Developing Applications with the IoT Button - March 2017 AWS Online Tech TalksDeveloping Applications with the IoT Button - March 2017 AWS Online Tech Talks
Developing Applications with the IoT Button - March 2017 AWS Online Tech TalksAmazon Web Services
 
5 things that still surprise me about Digital Marketing today
5 things that still surprise me about Digital Marketing today5 things that still surprise me about Digital Marketing today
5 things that still surprise me about Digital Marketing todayDave Chaffey
 
Introduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code ServicesIntroduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code ServicesAmazon Web Services
 
An Overview of Designing Microservices Based Applications on AWS - March 2017...
An Overview of Designing Microservices Based Applications on AWS - March 2017...An Overview of Designing Microservices Based Applications on AWS - March 2017...
An Overview of Designing Microservices Based Applications on AWS - March 2017...Amazon Web Services
 
Automate Software Deployments on EC2 with AWS CodeDeploy
Automate Software Deployments on EC2 with AWS CodeDeployAutomate Software Deployments on EC2 with AWS CodeDeploy
Automate Software Deployments on EC2 with AWS CodeDeployAmazon Web Services
 
Application Lifecycle Management in a Serverless World
Application Lifecycle Management in a Serverless WorldApplication Lifecycle Management in a Serverless World
Application Lifecycle Management in a Serverless WorldAmazon Web Services
 

Destacado (18)

UX dans un monde de distraction
UX dans un monde de distractionUX dans un monde de distraction
UX dans un monde de distraction
 
The ROI of Scaling Agile - How to justify the investment in terms your CFO wi...
The ROI of Scaling Agile - How to justify the investment in terms your CFO wi...The ROI of Scaling Agile - How to justify the investment in terms your CFO wi...
The ROI of Scaling Agile - How to justify the investment in terms your CFO wi...
 
Large-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSCLarge-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSC
 
Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements - ...
Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements - ...Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements - ...
Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements - ...
 
Dev ops + ITIL / mejor juntos webinar
Dev ops + ITIL / mejor juntos webinarDev ops + ITIL / mejor juntos webinar
Dev ops + ITIL / mejor juntos webinar
 
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech TalksHands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
 
Deep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech Talks
Deep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech TalksDeep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech Talks
Deep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech Talks
 
Digital Strategy Success 2016
Digital Strategy Success 2016Digital Strategy Success 2016
Digital Strategy Success 2016
 
B2B Marketing Automation 2017
B2B Marketing Automation 2017B2B Marketing Automation 2017
B2B Marketing Automation 2017
 
Infrastructure Continuous Delivery Using AWS CloudFormation
Infrastructure Continuous Delivery Using AWS CloudFormationInfrastructure Continuous Delivery Using AWS CloudFormation
Infrastructure Continuous Delivery Using AWS CloudFormation
 
Developing Applications with the IoT Button - March 2017 AWS Online Tech Talks
Developing Applications with the IoT Button - March 2017 AWS Online Tech TalksDeveloping Applications with the IoT Button - March 2017 AWS Online Tech Talks
Developing Applications with the IoT Button - March 2017 AWS Online Tech Talks
 
5 things that still surprise me about Digital Marketing today
5 things that still surprise me about Digital Marketing today5 things that still surprise me about Digital Marketing today
5 things that still surprise me about Digital Marketing today
 
IAM Best Practices
IAM Best PracticesIAM Best Practices
IAM Best Practices
 
IAM Introduction
IAM IntroductionIAM Introduction
IAM Introduction
 
Introduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code ServicesIntroduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code Services
 
An Overview of Designing Microservices Based Applications on AWS - March 2017...
An Overview of Designing Microservices Based Applications on AWS - March 2017...An Overview of Designing Microservices Based Applications on AWS - March 2017...
An Overview of Designing Microservices Based Applications on AWS - March 2017...
 
Automate Software Deployments on EC2 with AWS CodeDeploy
Automate Software Deployments on EC2 with AWS CodeDeployAutomate Software Deployments on EC2 with AWS CodeDeploy
Automate Software Deployments on EC2 with AWS CodeDeploy
 
Application Lifecycle Management in a Serverless World
Application Lifecycle Management in a Serverless WorldApplication Lifecycle Management in a Serverless World
Application Lifecycle Management in a Serverless World
 

Similar a Challenges for Industrial-strength Information Retrieval on Databases

Neurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedNeurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedStanford University
 
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel GuideAnything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel GuideAhmet Akyol
 
Machine Learning Basics to get you into Leading Tech companies.pptx
Machine Learning Basics to get you into Leading Tech companies.pptxMachine Learning Basics to get you into Leading Tech companies.pptx
Machine Learning Basics to get you into Leading Tech companies.pptxNanda Kishore Mallapragada
 
Managing Database Indexes: A Data-Driven Approach - Amadeus Magrabi
Managing Database Indexes: A Data-Driven Approach - Amadeus MagrabiManaging Database Indexes: A Data-Driven Approach - Amadeus Magrabi
Managing Database Indexes: A Data-Driven Approach - Amadeus MagrabiAmadeus Magrabi
 
ICIC 2017: How to effectively monitor Technological Developments in IP
ICIC 2017: How to effectively monitor Technological Developments in IPICIC 2017: How to effectively monitor Technological Developments in IP
ICIC 2017: How to effectively monitor Technological Developments in IPDr. Haxel Consult
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data PlatformDani Solà Lagares
 
Neo4j Product Update and Bloom Demo
Neo4j Product Update and Bloom DemoNeo4j Product Update and Bloom Demo
Neo4j Product Update and Bloom DemoNeo4j
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Codemotion
 
Building powerful apps with ArangoDB & KeyLines
Building powerful apps with ArangoDB & KeyLinesBuilding powerful apps with ArangoDB & KeyLines
Building powerful apps with ArangoDB & KeyLinesCambridge Intelligence
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoSpark Summit
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned Omid Vahdaty
 
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...Neo4j
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Jos van Dongen
 
15cs81-iot-module4-convertedPass .pptx
15cs81-iot-module4-convertedPass   .pptx15cs81-iot-module4-convertedPass   .pptx
15cs81-iot-module4-convertedPass .pptxGaganaGowda31
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022ArangoDB Database
 
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsThe Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsInside Analysis
 
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...MongoDB
 

Similar a Challenges for Industrial-strength Information Retrieval on Databases (20)

Neurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedNeurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons Learned
 
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel GuideAnything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
 
Machine Learning Basics to get you into Leading Tech companies.pptx
Machine Learning Basics to get you into Leading Tech companies.pptxMachine Learning Basics to get you into Leading Tech companies.pptx
Machine Learning Basics to get you into Leading Tech companies.pptx
 
Managing Database Indexes: A Data-Driven Approach - Amadeus Magrabi
Managing Database Indexes: A Data-Driven Approach - Amadeus MagrabiManaging Database Indexes: A Data-Driven Approach - Amadeus Magrabi
Managing Database Indexes: A Data-Driven Approach - Amadeus Magrabi
 
ICIC 2017: How to effectively monitor Technological Developments in IP
ICIC 2017: How to effectively monitor Technological Developments in IPICIC 2017: How to effectively monitor Technological Developments in IP
ICIC 2017: How to effectively monitor Technological Developments in IP
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
Neo4j Product Update and Bloom Demo
Neo4j Product Update and Bloom DemoNeo4j Product Update and Bloom Demo
Neo4j Product Update and Bloom Demo
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
 
Big Data Usecases
Big Data UsecasesBig Data Usecases
Big Data Usecases
 
Building powerful apps with ArangoDB & KeyLines
Building powerful apps with ArangoDB & KeyLinesBuilding powerful apps with ArangoDB & KeyLines
Building powerful apps with ArangoDB & KeyLines
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott Cordo
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
 
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?
 
15cs81-iot-module4-convertedPass .pptx
15cs81-iot-module4-convertedPass   .pptx15cs81-iot-module4-convertedPass   .pptx
15cs81-iot-module4-convertedPass .pptx
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
 
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsThe Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
 
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
 
Data science tutorial
Data science tutorialData science tutorial
Data science tutorial
 

Último

一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理F
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge GraphsEleniIlkou
 
一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理F
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoilmeghakumariji156
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsPriya Reddy
 
Call girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsCall girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsMonica Sydney
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfJOHNBEBONYAP1
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirtrahman018755
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样ayvbos
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查ydyuyu
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...kajalverma014
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsMonica Sydney
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.krishnachandrapal52
 
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...kumargunjan9515
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查ydyuyu
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdfMatthew Sinclair
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC
 

Último (20)

一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
 
Call girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsCall girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girls
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 

Challenges for Industrial-strength Information Retrieval on Databases

  • 1. Challenges for industrial-strength Information Retrieval on Databases R. Cornacchia, M. Hildebrand, A.P. de Vries, F. Dorssers KARS2017 - 21 March 2017, Venice, IT
  • 2. ○ Since 2010 ○ Spin-off of CWI, Amsterdam ○ “Search by Strategy” About Spinque
  • 3. Outline 1. Search is everywhere 2. Tailored search is expected 3. Tailored search needs modelling 4. Search modelling by information specialists 5. Search modelling needs flexible IR & DB 6. IR on DB: it works
  • 4. Search is everywhere Real world scenarios Technical Desktop Coding content assistant Product recommendation Personalised newsfeed
  • 5. Let’s pick a simple one: autocompletion iphone 7 iphone 5c iphone 6s ipho| “autocompletion is trivial” .. not so fast! Tailored search is expected
  • 6. autocompletion iphone 7 iphone 5c iphone 6s ipho| Basic - products ○ Any matching term from the index ○ Suggest products Tailored search is expected
  • 7. autocompletion iphone 7 iphone 5c iphone 6 cases ipho| Basic - products & categories ○ Any matching term from the index ○ Suggest products & categories Tailored search is expected
  • 8. autocompletion iphone 7 iphone 6 cases iphone 6s ipho| Filtered ○ Any matching term from the index ○ “iPhone 5c” out of stock Tailored search is expected
  • 9. autocompletion iphone 8 iphone 7 iphone 6 cases ipho| Filtered & ranked ○ “iPhone 5c” out of stock ○ “iPhone 8” the most requested Tailored search is expected
  • 10. autocompletion iphone cases iphone adapters iphone 7 ipho| Exploratory ○ First suggest categories.. ○ .. then products Tailored search is expected
  • 11. autocompletion iphone 7 cases iphone 7 adapters iphone 8 ipho| Personalised ○ I already own an “iPhone 7” ○ Suggest compatible accessories ○ Suggest upgrade Tailored search is expected
  • 12. What if my search API isn’t enough? Tailored search needs modelling iphone 7 cases iphone 7 adapters iphone 8 ipho| <your favourite autocompletion> ○ Out-of-the-box API may fall short ○ Build custom search API ○ Who? How? http://localhost:8983/solr/suggest?q=ipho
  • 13. How do we build custom search APIs? Search modelling by information specialists data modelling search modelling Spinque: Empower the information specialist
  • 14. Empowering the information specialist data modelling search modelling Search modelling by information specialists
  • 15. Data modelling Search modelling needs flexible IR & DB business transactions social media
  • 16. Search modelling standard autocompletion custom autocompletion Search modelling by information specialists http://spinque/suggest?q=ipho http://spinque/suggest_ranked?q=ipho
  • 17. The IR & DB challenge Search modelling needs flexible IR & DB ○ IR & DB both needed even for trivial tasks ○ Different technologies / focus ○ How / where to integrate task results? ○ Do they stay black boxes? ○ Can we express them in the same platform, and when does this make sense? http://spinque/suggest_ranked?q=ipho
  • 18. Text retrieval by strategy Search modelling needs flexible IR & DB text retrieval.. ..is just another DB query ○ strategy-driven “collection” and “documents” ○ on-demand indexing ○ it takes just standard SQL
  • 19. Graph DB by strategy Search modelling needs flexible IR & DB Visual modelling Relational Algebra Graph subject property object 123 name pen 123 availability in stock 123 price 9.99
  • 20. Graph DB by strategy Search modelling needs flexible IR & DB we want DB & ranking together & seamlessly what if this.. ..could work on this? subject property object p 123 name pen 1.0 123 availability in stock 0.8 123 price 9.99 1.0
  • 21. Rank. Everything. Always. Search modelling needs flexible IR & DB rank products.. ..get ranked orders and customers Fuhr, Rölleke, 1997, A probabilistic relational algebra for the integration of IR and DB SELECT g.obj, (o.p * g.p) as p FROM graph g, ranked_orders o WHERE g.subj = o.id AND g.rel = ’orderedBy’; PROJECT [$3] JOIN INDEPENDENT [$1=$1] SELECT [$2=’orderedBy’] (g) ranked_orders SQLPRA
  • 22. What about efficiency? IR on DB: it works 1.1M docs, 2.3GB 4-core i7-3770s, 16GB RAM, 256GB SSD find documents: 20ms 8M lots, 25K auctions (10GB raw data) VM (8 CPUs) on Xeon E5-2620, 16GB RAM, 256GB SSD find lots: 150ms topic
  • 23. What about efficiency? IR on DB: it works pre-compute what can be pre-computed.. ..but do it query-driven ○ Index on demand ○ Cache result of relational expressions ○ Algebraic analysis to determine cache
  • 24. What about efficiency? IR on DB: it works choose it carefully.. ..then enjoy ○ Main benefits of IR on DB ○ IR as a DB optimisation problem ○ No custom extensions, no vendor-lock ○ Column-store, CPU-friendly DB engine Hey, we made our join 20% faster. You are welcome.
  • 25. ○ If you just text retrieval on documents ○ Lucene-like will serve you well ○ Information needs tend to be more complex ○ Solve at application-level: common and painful ○ A one-platform approach pays off IR on DB: when does it make sense? IR on DB: it works
  • 26. Conclusions 1. Search is everywhere ○ In the real world.. 2. Tailored search is expected ○ ..there is no search like another. 3. Tailored search needs modelling ○ Someone will put effort in it.. 4. Search modelling by information specialists ○ ..who better than the right person for the job? 5. Search modelling needs flexible IR & DB ○ Who takes care of the low-level details then? 6. IR on DB: it works ○ The right tools. The right architecture.
  • 27. ○ Live updates ○ ACID transactions overhead ○ Scale out ○ It’s more than “just an inverted file” to be distributed ○ Even better support for information specialists ○ Strategy auto-tuning Challenges ahead
  • 28. supporting information specialists Don’t program search engines, design them