SlideShare a Scribd company logo
1 of 51
How to build the next 1000
    search engines?!

         Arjen P. de Vries
          arjen@acm.org
      Centrum Wiskunde & Informatica
       Delft University of Technology
                Spinque B.V.
Search is everywhere
Search is everywhere
 Yet it only works well on the web…
Complications
 Heterogeneous data sources
   WWW, wikipedia, news, e-
    mail, patents, twitter, personal information, …
 Varying result types
   “Documents”, tweets, courses, people, expert
    s, gene expressions, temperatures, …
 Multiple dimensions of relevance
   Topicality, recency, reading level, …
Complications
 Many search tasks require a mix within
  these dimensions:
   News and patents
   Companies and their CEOs
   Recent and on topic
 Many search tasks also require a mix
  across these dimensions:
   Patents assigned to our top 3 competitors in
    market segments mentioned in the recent
    press releases issued by our top 10 clients
 System‟s internal information representation
   Linguistic annotations
      Named entities, sentiment, dependencies, …
   Knowledge resources
      Wikipedia, Freebase, IDC9, IPTC, …
   Links to related documents
      Citations, urls
 Anchors that describe the URI
   Anchor text
 Queries that lead to clicks on the URI
   Session, user, dwell-time, …
 Tweets that mention the URI
   Time, location, user, …
 Other social media that describe the URI
   User, rating
   Tag, organisation of `folksonomy‟
     + UNCERTAINTY ALL OVER!
What goes in the black box?
   Document Collection:
      Anchors
      Entity types
      Sentiment
      Tweets                       BM25
      Cited documents              BM25F
               …                     LM
                                    RM              Ranked
                                    VSM               list
                                    DFR                of
                                                    answers
                                     QIR?
User                           Learning to rank?




   Context

                  ECIR / CIKM / SIGIR / ICTIR / WSDM papers!
Rarely & scarcely addressed…

    Student: How do I build it?
   Professor: Who will build it for
               me?


        Last session of the conference…
Search System
Parameterised Search System




        Cornacchia, De Vries, ECIR 2007
        A Parametrised Search System
Parameterised Search System

    Cannot we ‘remove’
    this IR engineer (or
    scientist!) from the
      loop, like DBMS
     software removes
     the data engineer
       from the loop?




                      Cornacchia, De Vries, ECIR 2007
                      A Parametrised Search System
And three (four?) children, a startup and 5 years later, a PhD defense!
Search by Strategy
 Visually construct search strategies by
  connecting building blocks
Search by Strategy
 Visually construct search strategies by
  connecting building blocks
 Each block describes either data or actions
  upon that data
   Connection points (“pins”) are typed:
    doc / sec / term / ne (named entity) / tuple
   Actions are expressed as scripts (later more)
Strategy Builder
From Patent to Inventor
Reports




          Visits
Generate Search Engine!




Or, really, generate a REST API from the strategy specification!
Demo
(Showed demo of children‟s search engine)
How Strategies Help
 Strategies improve communication between
  search intermediary and user
   Encapsulate domain expert knowledge
   Abstract representation of search expert knowledge
   Analyze information seeking process at any stage
 Strategies facilitate knowledge management
   Store / share / publish / refine
 Strategies mix exact (DB) and ranked (IR)
  searches
   Avoid the need for “human (probabilistic) joins”
Search Intermediaries
 Travel agency




                                   Task complexity
 Real estate agents
 Recruiters
 Librarians
 Archivists
 Digital forensics detectives
 Patent information specialists
Exploratory Search
 Search & (Faceted) Browsing
   Help discover schema, ontology, etc.
   Help discover the relevant sources
     Within-collection (by year/location, by type, …)
     Across multiple collections (by source)
Probabilistic faceted browsing
    Traditional (boolean
    filters)                                      Probabilistic
                                 Price                                             Price

                                 • 100K - 200K                                     • 100K - 200K
                                 • 200K - 300K                                     • 200K - 300K
                                 • 300K - 400K                                     • 300K - 400K

                                 Rooms                                             Rooms

                                 • 3                                               • 3
                                 • 4                                               • 4
                                 • 5                                               • 5

                                 Size                                              Size

                                 • 100 - 150 m2                                    • 100 - 150 m2
                                 • 150 - 200 m2                                    • 150 - 200 m2
                                 • 200 - 250 m2                                    • 200 - 250 m2



•    Good when user knows exactly                 •   Good for exploratory search
     which filters to apply
                                                  •   Will see perfect-match results
•    Will see perfect-match results
•    Won’t see “interesting” results              •   Will also see “interesting” results
Dynamic facets

    Pre-indexed                                Dynamic
                              Price                                                 Price

                              • 100K - 200K                                         • 100K - 200K
                              • 200K - 300K                                         • 200K - 300K
                              • 300K - 400K                                         • 300K - 400K

                              Rooms                                                 Rooms

                              • 3                                                   • 3
                              • 4                                                   • 4
                              • 5                                                   • 5

                              Size                                                  Size

                              • 100 - 150 m2                                        • 100 - 150 m2
                              • 150 - 200 m2                                        • 150 - 200 m2
                              • 200 - 250 m2                                        • 200 - 250 m2




•   Pre-defined ad-hoc indices                 •   Facets decided from result set
    intersected with result set                •   Challenge: dynamically adapt granularity
•   Challenge: many indices to maintain             • Different price ranges for villa/garage!
                                               •   Challenge: heavy concurrent queries to DB
Demo
(Showed Spinque‟s Real-estate search
  demo)
Limitations Search & Browse
 Faceted exploration does not include joins
   Cannot construct new data sources from
    existing ones!
   Only the pre-defined paths through the
    information space can actually be traversed
Who needs a Join?
 You!!!
  … whenever „relevance cues‟ are typed:
   People (e.g., inventors)
   Companies (e.g., assignees)
   Categories (e.g., IPTC)
   Time (e.g., expiry date)
   Location (e.g., country)
 … or whenever multiple sources are to be
 combined
   E.g., patents & news, patents & Wikipedia, …
Patents on X by Y(y)

            by Y(y)
Interactive Information Access

 Feedback:
   Interaction improves information
    representation
 Faceted Browsing:
   Interaction can let user take over where
    machine would fail
 Search by Strategy:
   Interaction can let user take over where
    system designer would fail
Conclusion
 “No idealized one-shot search engine”
 Empower the user!
Under the Hood
From Strategies to DB Queries
  in1     in2         in3
                                 Strategy

                            • Data flow
  BB1(in1,in2,in3, u1,u2)


                out

         in1


         BB2(in1)
                              Spinque: strategy
                out




  CREATE VIEW a AS
  SELECT ..                 • Query: strategy made operational
  CREATE VIEW b AS
  SELECT ..

  CREATE VIEW c AS
                              Spinque: PRA
  SELECT ..




                             Database
                              Spinque: RDBMS (MonetDB)
                                 Relational DB
Probabilistic Relational Algebra
                     Strategy




 x = Project DISTINCT
                                     • PRA: probabilistic
             [$1,$3](y);               relational algebra
                                       (Fuhr and
                                       Roelleke, TOIS 2001)

 CREATE VIEW x AS
 SELECT a1, a3,                      • SQL
         1-prod(1-prob) AS prob
 FROM y                                explicit probabilities
 GROUP BY a1, a3;



                     Relational DB
What‟s in the DB?
 Text-based ranking                                   T         D          f
   term-doc-freq relations (inverted file)            t0        d3         3
      One per language, stemming, section             t0        d5      10
   Domain-independent, click and index                t1        d2         4


 Entity ranking                         subj      pred/attr      obj/value      p

   Probabilistic triples                Arjen     speaks_to          you       0.95

   Domain-aware                             you    follow            Arjen     0.5

                                         speech    minutes             45       0.8
      Needs supervised indexing

 Content-based (MM) retrieval           Img_id             f1           …       fN

                                                                         …
   Feature vectors, click and index
                                              0          0.12                   0.84

                                              1          0.54            …      0.31

                                              2          0.23            …      0.1
VIEWS and TABLES
                                                                             User
                                                   Stored relation        parameter


   CREATE   VIEW
            TABLE   a   AS   SELECT   …   FROM   term-doc … ;
   CREATE   VIEW    b   AS   SELECT   …   FROM   a WHERE a.x = u1 ;
   CREATE   VIEW
            TABLE   c   AS   SELECT   …   FROM   a WHERE a.x = 42 ;
   CREATE   VIEW    d   AS   SELECT   …   FROM   b … ;                             No user
                                                                                  parameter
                                                                 Pre-computable
 BB content: sequence of VIEW definitions                           relation
 A VIEW is pre-computable when
    All the relations addressed are pre-computable / stored
    No dependency on user parameters
 Pre-computable VIEWs can become TABLEs (or MATERIALIZED
  VIEWs)
    Query-independent computations are performed only once, then
     read from TABLEs at each query
    Recognition of these patterns is fully automatic
    Extends MonetDB‟s per-session caching to across-sessions caching
What Next?
Current Situation
 index ;              Schema definition
 repeat {
      specify ;
      retrieve        Search & explore
 } until 
Traditional Indexing




 Preprocessing determines to large extend how
  search request form will be processed
   Especially regarding tokenization, stemming, etc.
 Fast and scalable, but inflexible
   E.g., entity search hard-coded on top of engine,
    advertisements matched on different data, etc.
Search by Strategy




 Flexible: generate arbitrary engine on the fly
 Not as fast as highly optimized and very well
  engineered inverted file based systems
Desirable Situation
 repeat {
      index ;     Mixed Initiative
      specify ;     Schema definition
                     Search & explore
      retrieve
 } until 
Non-Indexed Search




 Grep
   Very flexible
      Use it all the time on my mh mail folders when gmail
       fails me!
   Not scalable, little or no structure
Minimal Indexing




 How to reduce pre-processing necessary to
  create a search engine over a new collection?
   Can we do without a keyword index?
   Can we avoid hardwired decisions for tokenization,
    language detection, stemming, …
Suffix Array
 Pro's:
   provides many core search functions: term
    statistics, keyword search, phrase search.
   no upfront tokenization needed (access at
    character level)
   no upfront language detection needed
 Con's:
   difficult to build for large corpora
   expensive w.r.t. disk space
Demo
(Showed patent search demo)
“Real Code”
Patents on X by Y(y)

            by Y(y)
PRA
s__STRATEGY___filter_DOC_with_NE_nes =
Project [$2,$3](
 Join [$1 = $2](
    s__STRATEGY___clef_ip_patents_DATA_result,
    Project [$1,$3](
       Select [$2 = "ipcr-classification"](
          s__STRATEGY___clef_ip_patents_DATA_ne_doc
       )
    )
 )
);
CREATE TABLE s__STRATEGY___filter_DOC_with_NE_nes AS
   SELECT
    tmp_1814091754.a2 AS a1,
    tmp_1814091754.a3 AS a2,
    tmp_1814091754.prob AS prob
   FROM
   (
     SELECT
         s__STRATEGY___clef_ip_patents_DATA_result.a1 AS a1,
         tmp__1652836708.a1 AS a2,
         tmp__1652836708.a2 AS a3,
        s__STRATEGY___clef_ip_patents_DATA_result.prob
           * tmp__1652836708.prob AS prob
    FROM
        s__STRATEGY___clef_ip_patents_DATA_result,
        (
            SELECT
                 tmp_1444787941.a1 AS a1,
                 tmp_1444787941.a3 AS a2,
                 tmp_1444787941.prob AS prob
             FROM
                 (
                      SELECT
                          s__STRATEGY___clef_ip_patents_DATA_ne_doc.a1 AS a1,
                          s__STRATEGY___clef_ip_patents_DATA_ne_doc.a2 AS a2,
                          s__STRATEGY___clef_ip_patents_DATA_ne_doc.a3 AS a3,
                          s__STRATEGY___clef_ip_patents_DATA_ne_doc.prob AS prob
                  FROM
                     s__STRATEGY___clef_ip_patents_DATA_ne_doc
                  WHERE
                     s__STRATEGY___clef_ip_patents_DATA_ne_doc.a2
                             =‘ipcr-classification’
              ) AS tmp_1444787941
       ) AS tmp__1652836708
    WHERE
       s__STRATEGY___clef_ip_patents_DATA_result.a1
              = tmp__1652836708.a2
   ) AS tmp_1814091754
   ORDER BY a1
   WITH DATA;
info@spinque.com
    www.spinque.com
facebook.com/spinque

More Related Content

Similar to How to build the next 1000 search engines?!

Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesOpenSource Connections
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectorsSimon Hughes
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingSimon Hughes
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Lucidworks
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialSteven Francia
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Enginelucenerevolution
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRoelof Pieters
 
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...Infinum Android Talks #03 - Android Design Best Practices - for Designers and...
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...Infinum
 
Intelligent Stream Filtering Using MongoDB
Intelligent Stream Filtering Using MongoDBIntelligent Stream Filtering Using MongoDB
Intelligent Stream Filtering Using MongoDBMihnea Giurgea
 
ZendCon 2011 UnCon Domain-Driven Design
ZendCon 2011 UnCon Domain-Driven DesignZendCon 2011 UnCon Domain-Driven Design
ZendCon 2011 UnCon Domain-Driven DesignBradley Holt
 
DDC2011 - Association
DDC2011 - AssociationDDC2011 - Association
DDC2011 - AssociationBuhwan Jeong
 
DevLOVE Beautiful Development - 第一幕 陽の巻
DevLOVE Beautiful Development - 第一幕 陽の巻DevLOVE Beautiful Development - 第一幕 陽の巻
DevLOVE Beautiful Development - 第一幕 陽の巻都元ダイスケ Miyamoto
 
U of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreU of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreTim Schneider
 
Sitecore at the University of Alberta
Sitecore at the University of AlbertaSitecore at the University of Alberta
Sitecore at the University of AlbertaTim Schneider
 
Tiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data ModelingTiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data ModelingRobert Sanderson
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Lucidworks
 
Clients in control: building demand-driven systems with Om Next
Clients in control: building demand-driven systems with Om NextClients in control: building demand-driven systems with Om Next
Clients in control: building demand-driven systems with Om NextAntónio Monteiro
 
Android Talks #3 Android Design Best Practices - for Designers and Developers
Android Talks #3 Android Design Best Practices - for Designers and DevelopersAndroid Talks #3 Android Design Best Practices - for Designers and Developers
Android Talks #3 Android Design Best Practices - for Designers and DevelopersDenis_infinum
 

Similar to How to build the next 1000 search engines?! (20)

Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
MongoDB for Genealogy
MongoDB for GenealogyMongoDB for Genealogy
MongoDB for Genealogy
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Engine
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and Graphs
 
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...Infinum Android Talks #03 - Android Design Best Practices - for Designers and...
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...
 
Intelligent Stream Filtering Using MongoDB
Intelligent Stream Filtering Using MongoDBIntelligent Stream Filtering Using MongoDB
Intelligent Stream Filtering Using MongoDB
 
ZendCon 2011 UnCon Domain-Driven Design
ZendCon 2011 UnCon Domain-Driven DesignZendCon 2011 UnCon Domain-Driven Design
ZendCon 2011 UnCon Domain-Driven Design
 
DDC2011 - Association
DDC2011 - AssociationDDC2011 - Association
DDC2011 - Association
 
DevLOVE Beautiful Development - 第一幕 陽の巻
DevLOVE Beautiful Development - 第一幕 陽の巻DevLOVE Beautiful Development - 第一幕 陽の巻
DevLOVE Beautiful Development - 第一幕 陽の巻
 
U of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreU of A Web Strategy and Sitecore
U of A Web Strategy and Sitecore
 
Sitecore at the University of Alberta
Sitecore at the University of AlbertaSitecore at the University of Alberta
Sitecore at the University of Alberta
 
Tiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data ModelingTiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data Modeling
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
 
Clients in control: building demand-driven systems with Om Next
Clients in control: building demand-driven systems with Om NextClients in control: building demand-driven systems with Om Next
Clients in control: building demand-driven systems with Om Next
 
14 spatial analyst
14   spatial analyst14   spatial analyst
14 spatial analyst
 
Android Talks #3 Android Design Best Practices - for Designers and Developers
Android Talks #3 Android Design Best Practices - for Designers and DevelopersAndroid Talks #3 Android Design Best Practices - for Designers and Developers
Android Talks #3 Android Design Best Practices - for Designers and Developers
 

More from Arjen de Vries

Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Arjen de Vries
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Arjen de Vries
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Arjen de Vries
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineArjen de Vries
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social MediaArjen de Vries
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMMArjen de Vries
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsArjen de Vries
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master SpecialisationArjen de Vries
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big DataArjen de Vries
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part IIArjen de Vries
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with SparkArjen de Vries
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelArjen de Vries
 
The personal search engine
The personal search engineThe personal search engine
The personal search engineArjen de Vries
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationArjen de Vries
 
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeArjen de Vries
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Arjen de Vries
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Arjen de Vries
 
Twente ir-course 20-10-2010
Twente ir-course 20-10-2010Twente ir-course 20-10-2010
Twente ir-course 20-10-2010Arjen de Vries
 
Context Adaptation in Image Search
Context Adaptation in Image SearchContext Adaptation in Image Search
Context Adaptation in Image SearchArjen de Vries
 

More from Arjen de Vries (20)

Doing a PhD @ DOSSIER
Doing a PhD @ DOSSIERDoing a PhD @ DOSSIER
Doing a PhD @ DOSSIER
 
Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen)
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6)
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social Media
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMM
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC Chairs
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master Specialisation
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part II
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward Panel
 
The personal search engine
The personal search engineThe personal search engine
The personal search engine
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain Knowledge
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
 
Twente ir-course 20-10-2010
Twente ir-course 20-10-2010Twente ir-course 20-10-2010
Twente ir-course 20-10-2010
 
Context Adaptation in Image Search
Context Adaptation in Image SearchContext Adaptation in Image Search
Context Adaptation in Image Search
 

Recently uploaded

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

How to build the next 1000 search engines?!

  • 1. How to build the next 1000 search engines?! Arjen P. de Vries arjen@acm.org Centrum Wiskunde & Informatica Delft University of Technology Spinque B.V.
  • 3. Search is everywhere  Yet it only works well on the web…
  • 4. Complications  Heterogeneous data sources  WWW, wikipedia, news, e- mail, patents, twitter, personal information, …  Varying result types  “Documents”, tweets, courses, people, expert s, gene expressions, temperatures, …  Multiple dimensions of relevance  Topicality, recency, reading level, …
  • 5. Complications  Many search tasks require a mix within these dimensions:  News and patents  Companies and their CEOs  Recent and on topic  Many search tasks also require a mix across these dimensions:  Patents assigned to our top 3 competitors in market segments mentioned in the recent press releases issued by our top 10 clients
  • 6.  System‟s internal information representation  Linguistic annotations  Named entities, sentiment, dependencies, …  Knowledge resources  Wikipedia, Freebase, IDC9, IPTC, …  Links to related documents  Citations, urls  Anchors that describe the URI  Anchor text  Queries that lead to clicks on the URI  Session, user, dwell-time, …  Tweets that mention the URI  Time, location, user, …  Other social media that describe the URI  User, rating  Tag, organisation of `folksonomy‟ + UNCERTAINTY ALL OVER!
  • 7. What goes in the black box? Document Collection: Anchors Entity types Sentiment Tweets BM25 Cited documents BM25F … LM RM Ranked VSM list DFR of answers QIR? User Learning to rank? Context ECIR / CIKM / SIGIR / ICTIR / WSDM papers!
  • 8. Rarely & scarcely addressed… Student: How do I build it? Professor: Who will build it for me? Last session of the conference…
  • 10. Parameterised Search System Cornacchia, De Vries, ECIR 2007 A Parametrised Search System
  • 11. Parameterised Search System Cannot we ‘remove’ this IR engineer (or scientist!) from the loop, like DBMS software removes the data engineer from the loop? Cornacchia, De Vries, ECIR 2007 A Parametrised Search System And three (four?) children, a startup and 5 years later, a PhD defense!
  • 12. Search by Strategy  Visually construct search strategies by connecting building blocks
  • 13.
  • 14. Search by Strategy  Visually construct search strategies by connecting building blocks  Each block describes either data or actions upon that data  Connection points (“pins”) are typed: doc / sec / term / ne (named entity) / tuple  Actions are expressed as scripts (later more)
  • 16. From Patent to Inventor
  • 17. Reports Visits
  • 18. Generate Search Engine! Or, really, generate a REST API from the strategy specification!
  • 19. Demo (Showed demo of children‟s search engine)
  • 20. How Strategies Help  Strategies improve communication between search intermediary and user  Encapsulate domain expert knowledge  Abstract representation of search expert knowledge  Analyze information seeking process at any stage  Strategies facilitate knowledge management  Store / share / publish / refine  Strategies mix exact (DB) and ranked (IR) searches  Avoid the need for “human (probabilistic) joins”
  • 21.
  • 22. Search Intermediaries  Travel agency Task complexity  Real estate agents  Recruiters  Librarians  Archivists  Digital forensics detectives  Patent information specialists
  • 23. Exploratory Search  Search & (Faceted) Browsing  Help discover schema, ontology, etc.  Help discover the relevant sources  Within-collection (by year/location, by type, …)  Across multiple collections (by source)
  • 24. Probabilistic faceted browsing Traditional (boolean filters) Probabilistic Price Price • 100K - 200K • 100K - 200K • 200K - 300K • 200K - 300K • 300K - 400K • 300K - 400K Rooms Rooms • 3 • 3 • 4 • 4 • 5 • 5 Size Size • 100 - 150 m2 • 100 - 150 m2 • 150 - 200 m2 • 150 - 200 m2 • 200 - 250 m2 • 200 - 250 m2 • Good when user knows exactly • Good for exploratory search which filters to apply • Will see perfect-match results • Will see perfect-match results • Won’t see “interesting” results • Will also see “interesting” results
  • 25. Dynamic facets Pre-indexed Dynamic Price Price • 100K - 200K • 100K - 200K • 200K - 300K • 200K - 300K • 300K - 400K • 300K - 400K Rooms Rooms • 3 • 3 • 4 • 4 • 5 • 5 Size Size • 100 - 150 m2 • 100 - 150 m2 • 150 - 200 m2 • 150 - 200 m2 • 200 - 250 m2 • 200 - 250 m2 • Pre-defined ad-hoc indices • Facets decided from result set intersected with result set • Challenge: dynamically adapt granularity • Challenge: many indices to maintain • Different price ranges for villa/garage! • Challenge: heavy concurrent queries to DB
  • 27. Limitations Search & Browse  Faceted exploration does not include joins  Cannot construct new data sources from existing ones!  Only the pre-defined paths through the information space can actually be traversed
  • 28. Who needs a Join?  You!!! … whenever „relevance cues‟ are typed:  People (e.g., inventors)  Companies (e.g., assignees)  Categories (e.g., IPTC)  Time (e.g., expiry date)  Location (e.g., country) … or whenever multiple sources are to be combined  E.g., patents & news, patents & Wikipedia, …
  • 29. Patents on X by Y(y) by Y(y)
  • 30. Interactive Information Access  Feedback:  Interaction improves information representation  Faceted Browsing:  Interaction can let user take over where machine would fail  Search by Strategy:  Interaction can let user take over where system designer would fail
  • 31. Conclusion  “No idealized one-shot search engine”  Empower the user!
  • 33. From Strategies to DB Queries in1 in2 in3 Strategy • Data flow BB1(in1,in2,in3, u1,u2) out in1 BB2(in1) Spinque: strategy out CREATE VIEW a AS SELECT .. • Query: strategy made operational CREATE VIEW b AS SELECT .. CREATE VIEW c AS Spinque: PRA SELECT ..  Database Spinque: RDBMS (MonetDB) Relational DB
  • 34. Probabilistic Relational Algebra Strategy x = Project DISTINCT • PRA: probabilistic [$1,$3](y); relational algebra (Fuhr and Roelleke, TOIS 2001) CREATE VIEW x AS SELECT a1, a3, • SQL 1-prod(1-prob) AS prob FROM y explicit probabilities GROUP BY a1, a3; Relational DB
  • 35. What‟s in the DB?  Text-based ranking T D f  term-doc-freq relations (inverted file) t0 d3 3  One per language, stemming, section t0 d5 10  Domain-independent, click and index t1 d2 4  Entity ranking subj pred/attr obj/value p  Probabilistic triples Arjen speaks_to you 0.95  Domain-aware you follow Arjen 0.5 speech minutes 45 0.8  Needs supervised indexing  Content-based (MM) retrieval Img_id f1 … fN …  Feature vectors, click and index 0 0.12 0.84 1 0.54 … 0.31 2 0.23 … 0.1
  • 36. VIEWS and TABLES User Stored relation parameter CREATE VIEW TABLE a AS SELECT … FROM term-doc … ; CREATE VIEW b AS SELECT … FROM a WHERE a.x = u1 ; CREATE VIEW TABLE c AS SELECT … FROM a WHERE a.x = 42 ; CREATE VIEW d AS SELECT … FROM b … ; No user parameter Pre-computable  BB content: sequence of VIEW definitions relation  A VIEW is pre-computable when  All the relations addressed are pre-computable / stored  No dependency on user parameters  Pre-computable VIEWs can become TABLEs (or MATERIALIZED VIEWs)  Query-independent computations are performed only once, then read from TABLEs at each query  Recognition of these patterns is fully automatic  Extends MonetDB‟s per-session caching to across-sessions caching
  • 38. Current Situation  index ; Schema definition  repeat {  specify ;  retrieve Search & explore  } until 
  • 39. Traditional Indexing  Preprocessing determines to large extend how search request form will be processed  Especially regarding tokenization, stemming, etc.  Fast and scalable, but inflexible  E.g., entity search hard-coded on top of engine, advertisements matched on different data, etc.
  • 40. Search by Strategy  Flexible: generate arbitrary engine on the fly  Not as fast as highly optimized and very well engineered inverted file based systems
  • 41. Desirable Situation  repeat {  index ; Mixed Initiative  specify ; Schema definition Search & explore  retrieve  } until 
  • 42. Non-Indexed Search  Grep  Very flexible  Use it all the time on my mh mail folders when gmail fails me!  Not scalable, little or no structure
  • 43. Minimal Indexing  How to reduce pre-processing necessary to create a search engine over a new collection?  Can we do without a keyword index?  Can we avoid hardwired decisions for tokenization, language detection, stemming, …
  • 44. Suffix Array  Pro's:  provides many core search functions: term statistics, keyword search, phrase search.  no upfront tokenization needed (access at character level)  no upfront language detection needed  Con's:  difficult to build for large corpora  expensive w.r.t. disk space
  • 45.
  • 48. Patents on X by Y(y) by Y(y)
  • 49. PRA s__STRATEGY___filter_DOC_with_NE_nes = Project [$2,$3]( Join [$1 = $2]( s__STRATEGY___clef_ip_patents_DATA_result, Project [$1,$3]( Select [$2 = "ipcr-classification"]( s__STRATEGY___clef_ip_patents_DATA_ne_doc ) ) ) );
  • 50. CREATE TABLE s__STRATEGY___filter_DOC_with_NE_nes AS SELECT tmp_1814091754.a2 AS a1, tmp_1814091754.a3 AS a2, tmp_1814091754.prob AS prob FROM ( SELECT s__STRATEGY___clef_ip_patents_DATA_result.a1 AS a1, tmp__1652836708.a1 AS a2, tmp__1652836708.a2 AS a3, s__STRATEGY___clef_ip_patents_DATA_result.prob * tmp__1652836708.prob AS prob FROM s__STRATEGY___clef_ip_patents_DATA_result, ( SELECT tmp_1444787941.a1 AS a1, tmp_1444787941.a3 AS a2, tmp_1444787941.prob AS prob FROM ( SELECT s__STRATEGY___clef_ip_patents_DATA_ne_doc.a1 AS a1, s__STRATEGY___clef_ip_patents_DATA_ne_doc.a2 AS a2, s__STRATEGY___clef_ip_patents_DATA_ne_doc.a3 AS a3, s__STRATEGY___clef_ip_patents_DATA_ne_doc.prob AS prob FROM s__STRATEGY___clef_ip_patents_DATA_ne_doc WHERE s__STRATEGY___clef_ip_patents_DATA_ne_doc.a2 =‘ipcr-classification’ ) AS tmp_1444787941 ) AS tmp__1652836708 WHERE s__STRATEGY___clef_ip_patents_DATA_result.a1 = tmp__1652836708.a2 ) AS tmp_1814091754 ORDER BY a1 WITH DATA;
  • 51. info@spinque.com www.spinque.com facebook.com/spinque

Editor's Notes

  1. Does “Entity-based ranking” make sense?
  2. NOTE: MATERIALIZED VIEWs, where supported (not in MonetDB), can be used instead of TABLEs when stored relations (index) are expected to get updates.