SlideShare una empresa de Scribd logo
1 de 155
Search in the Biblical
      Domain
    Brian Seagraves (Bible.org)
What is “Search”?
What is “Search”?
•   Information/Document Retrieval
What is “Search”?
•   Information/Document Retrieval
•   Basic Definition:
What is “Search”?
•   Information/Document Retrieval
•   Basic Definition:
    •   Finding previously seen documents that are
        related to some user-supplied terms.
What is “Search”?
•   Information/Document Retrieval
•   Basic Definition:
    •   Finding previously seen documents that are
        related to some user-supplied terms.
•   Advanced Definition:
What is “Search”?
•   Information/Document Retrieval
•   Basic Definition:
    •   Finding previously seen documents that are
        related to some user-supplied terms.
•   Advanced Definition:
    •   Finding relevant content for some query by
        understanding the contextual meaning of
        terms in the search index and query.
What is “Search”?
•   Information/Document Retrieval
•   Basic Definition:
    •   Finding previously seen documents that are
        related to some user-supplied terms.
•   Advanced Definition:
    •   Finding relevant content for some query by
        understanding the contextual meaning of
        terms in the search index and query.
    •   Semantic Search
Types and Sources of
      Content
Types and Sources of
       Content

• The Bible and its verses
Types and Sources of
       Content

• The Bible and its verses
• Articles, Journals, and other extra-biblical
  content
Types and Sources of
       Content

• The Bible and its verses
• Articles, Journals, and other extra-biblical
  content
• The web
Information Retrieval
      Engines
Information Retrieval
       Engines
• Sphinx - http://sphinxsearch.com
Information Retrieval
       Engines
• Sphinx - http://sphinxsearch.com
• Lucene - http://lucene.apache.org/
Information Retrieval
       Engines
• Sphinx - http://sphinxsearch.com
• Lucene - http://lucene.apache.org/
 • Solr - http://lucene.apache.org/solr/
Information Retrieval
       Engines
• Sphinx - http://sphinxsearch.com
• Lucene - http://lucene.apache.org/
 • Solr - http://lucene.apache.org/solr/
• MySQL Fulltext Search - kinda
Solr
Solr
• Open Source
Solr
• Open Source
• Full-text search
Solr
• Open Source
• Full-text search
• Hit Highlighting
Solr
• Open Source
• Full-text search
• Hit Highlighting
• Facets
Solr
• Open Source
• Full-text search
• Hit Highlighting
• Facets
• Java
Solr
• Open Source
• Full-text search
• Hit Highlighting
• Facets
• Java
• REST-like HTTP/XML and JSON APIs
Solr Documents
Solr Documents

• A document represents a distinct piece of
  content that can be stored/retrieved
Solr Documents

• A document represents a distinct piece of
  content that can be stored/retrieved
 • Bible Verse
Solr Documents

• A document represents a distinct piece of
  content that can be stored/retrieved
 • Bible Verse
 • Journal Article
Solr Documents

• A document represents a distinct piece of
  content that can be stored/retrieved
 • Bible Verse
 • Journal Article
 • Commentary Chapter/Section
Solr Documents

• A document represents a distinct piece of
  content that can be stored/retrieved
 • Bible Verse
 • Journal Article
 • Commentary Chapter/Section
 • Web Page
Solr Documents
Solr Documents
•   Documents have one or more Fields
Solr Documents
•   Documents have one or more Fields
•   Fields Have types
Solr Documents
•   Documents have one or more Fields
•   Fields Have types
     •   Integer
Solr Documents
•   Documents have one or more Fields
•   Fields Have types
     •   Integer
     •   Float
Solr Documents
•   Documents have one or more Fields
•   Fields Have types
     •   Integer
     •   Float
     •   String
Solr Documents
•   Documents have one or more Fields
•   Fields Have types
     •   Integer
     •   Float
     •   String
     •   Text
Solr Documents
•   Documents have one or more Fields
•   Fields Have types
     •   Integer
     •   Float
     •   String
     •   Text
     •   Date
Solr Documents
•   Documents have one or more Fields
•   Fields Have types
     •   Integer
     •   Float
     •   String
     •   Text
     •   Date
     •   and More!
Solr Fields
Solr Fields

• Field Types can have:
Solr Fields

• Field Types can have:
 • Filters
Solr Fields

• Field Types can have:
 • Filters
    • Remove parts of the content
Solr Fields

• Field Types can have:
 • Filters
    • Remove parts of the content
 • Tokenizers
Solr Fields

• Field Types can have:
 • Filters
    • Remove parts of the content
 • Tokenizers
    • Split content into chunks/tokens
Solr Fields
Solr Fields
• The “String” Field Type
Solr Fields
• The “String” Field Type
• <fieldType
  name="string"
  class="solr.StrField" />
Solr Fields
• The “String” Field Type
• <fieldType
  name="string"
  class="solr.StrField" />
• No Filter; No Tokenizer
Solr Fields
• The “String” Field Type
• <fieldType
  name="string"
  class="solr.StrField" />
• No Filter; No Tokenizer
 • Field content won’t be split or changed
<fieldtype name="html_text" class="solr.TextField" >
  <analyzer type="index">
     <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
     <filter class="solr.StopFilterFactory"/>
    <filter class="solr.WordDelimiterFilterFactory"/>
     <filter class="solr.LowerCaseFilterFactory"/>
     <filter class="solr.EnglishPorterFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" />
     <filter class="solr.StopFilterFactory"/>
     <filter class="solr.WordDelimiterFilterFactory" />
     <filter class="solr.LowerCaseFilterFactory"/>
     <filter class="solr.EnglishPorterFilterFactory" />
     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldtype>
Sample Schema (cont.)
<fieldtype
 name="sint"
 class="solr.SortableIntField"
 omitNorms="true" />
<fieldtype
 name="string"
 class="solr.StrField"
 sortMissingLast="true"
 omitNorms="true"/>
Sample Schema (cont.)
<fields>
 
 <field name="id" type="sint" indexed="true" stored="true" multiValued="false" />
 
 <field name="abbr" type="string" indexed="true" stored="true" multiValued="false" />

 <field name="name" type="string" indexed="true" stored="true" multiValued="false" />

 <field name="book" type="sint" indexed="true" stored="true" multiValued="false" />

 <field name="chapter" type="sint" indexed="true" stored="true" multiValued="false" />

 <field name="verse" type="sint" indexed="true" stored="true" multiValued="false" />
    <field name="ot_nt" type="string" indexed="true" stored="true" multiValued="false" />
   <field name="net" type="text" indexed="false" stored="true" multiValued="false" />
    <field name="all_index" type="html_text" indexed="true" stored="false" />
</fields>

<copyField source="net" dest="all_index" />
<uniqueKey>id</uniqueKey>
<defaultSearchField>all_index</defaultSearchField>
<solrQueryParser defaultOperator="OR" />
Put Data in Solr
Put Data in Solr
• Remember, Solr communicates using XML
  over HTTP
Put Data in Solr
• Remember, Solr communicates using XML
  over HTTP
• No concept of updating a document -
  delete, then add
Put Data in Solr
• Remember, Solr communicates using XML
  over HTTP
• No concept of updating a document -
  delete, then add
• To add, POST XML to update handler
Put Data in Solr
• Remember, Solr communicates using XML
  over HTTP
• No concept of updating a document -
  delete, then add
• To add, POST XML to update handler
 • http://localhost:8080/solr/bible/update
Add XML
<add>
 <doc>
   <id>1</id>
   <net>In the beginning God created the heavens and
   the earth.</net>
 </doc>
</add>
PHP API
• No XML!
• $client = new SolrClient($options);
  $doc = new SolrInputDocument();
  $doc->addField('id', 1); //Must be Integer

  $doc->addField('net', ‘In the beginning God
  created the heavens and the earth.’);
  $client->addDocument($doc);
Querying Solr
Querying Solr

• HTTP GET Request
Querying Solr

• HTTP GET Request
• http://localhost:8080/solr/bible3/select?q=god
Querying Solr

• HTTP GET Request
• http://localhost:8080/solr/bible3/select?q=god
• | Path to Solr ||Core||Handler||Query |
Querying Solr

• HTTP GET Request
• http://localhost:8080/solr/bible3/select?q=god
• | Path to Solr ||Core||Handler||Query |
•   Returns XML By Default
Querying Solr

• HTTP GET Request
• http://localhost:8080/solr/bible3/select?q=god
• | Path to Solr ||Core||Handler||Query |
•   Returns XML By Default

•   Can return JSON and more
Querying Solr
Querying Solr

•   Queries the defaultSearchField by default
Querying Solr

•   Queries the defaultSearchField by default

    •   <defaultSearchField>all_index</defaultSearchField>
Querying Solr

•   Queries the defaultSearchField by default

    •   <defaultSearchField>all_index</defaultSearchField>

•   Can query other fields by using the syntax:field:value
Querying Solr

•   Queries the defaultSearchField by default

    •   <defaultSearchField>all_index</defaultSearchField>

•   Can query other fields by using the syntax:field:value

    •   http://localhost:8080/solr/bible3/select?q=id:27974
Querying Solr

•   Queries the defaultSearchField by default

    •   <defaultSearchField>all_index</defaultSearchField>

•   Can query other fields by using the syntax:field:value

    •   http://localhost:8080/solr/bible3/select?q=id:27974

•   Multiple queries / Booleans
Querying Solr

•   Queries the defaultSearchField by default

    •   <defaultSearchField>all_index</defaultSearchField>

•   Can query other fields by using the syntax:field:value

    •   http://localhost:8080/solr/bible3/select?q=id:27974

•   Multiple queries / Booleans
    •   http://localhost:8080/solr/bible3/select?q=god AND book:40
Search Multiple
Translations (Fields)
Search Multiple
         Translations (Fields)
•   Let’s add some fields: kjv and kjv_index
Search Multiple
         Translations (Fields)
•   Let’s add some fields: kjv and kjv_index

•   Add some copy field directives:
    <copyField source="kjv" dest="all_index" />
    <copyField source="kjv" dest="kjv_index" />
Search Multiple
         Translations (Fields)
•   Let’s add some fields: kjv and kjv_index

•   Add some copy field directives:
    <copyField source="kjv" dest="all_index" />
    <copyField source="kjv" dest="kjv_index" />

•   Query: “Shew Thyself”
Search Multiple
           Translations (Fields)
•   Let’s add some fields: kjv and kjv_index

•   Add some copy field directives:
    <copyField source="kjv" dest="all_index" />
    <copyField source="kjv" dest="kjv_index" />

•   Query: “Shew Thyself”

    •   0 Results in the NET
        http://localhost:8080/solr/bible3/select?q=shew%20theyself
Search Multiple
           Translations (Fields)
•   Let’s add some fields: kjv and kjv_index

•   Add some copy field directives:
    <copyField source="kjv" dest="all_index" />
    <copyField source="kjv" dest="kjv_index" />

•   Query: “Shew Thyself”

    •   0 Results in the NET
        http://localhost:8080/solr/bible3/select?q=shew%20theyself
    •   360 Results in the Combined index/field
        http://localhost:8080/solr/bible4/select?q=shew%20theyself
Search Multiple
 Translations
Search Multiple
           Translations
• + Quasi Synonym term/phrase injection
Search Multiple
            Translations
• + Quasi Synonym term/phrase injection
• + Less variation across translations leads to stronger
  possible matches
Search Multiple
            Translations
• + Quasi Synonym term/phrase injection
• + Less variation across translations leads to stronger
  possible matches
• + Matches verses when the source translation isn’t
  known
Search Multiple
            Translations
• + Quasi Synonym term/phrase injection
• + Less variation across translations leads to stronger
  possible matches
• + Matches verses when the source translation isn’t
  known
• - No control over which translation gets more weight
Search Multiple
            Translations
• + Quasi Synonym term/phrase injection
• + Less variation across translations leads to stronger
  possible matches
• + Matches verses when the source translation isn’t
  known
• - No control over which translation gets more weight
• - No control over scoring of matches
Search Multiple
                      Translations
•   Another way: Dismax
•   Can score a document (verse) match based on scores/matches
    from multiple fields.
•   net_index^1 kjv_index^1
    •   Not exponents - weights
    •   We’re searching the net_index and kjv_index fields, each with
        a boost/weight of 1.
•   net_index^6 kjv_index^.5
•   http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=.
    1&qf=net_index^1%20kjv_index^1&fl=score

•   http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=.
    1&qf=net_index^6%20kjv_index^.5&fl=score
Scoring
Scoring
•   score(q,d) =
    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))
                                 t in q
Scoring
•   score(q,d) =
    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))
                                 t in q




•   Basic Factors
Scoring
•   score(q,d) =
    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))
                                 t in q




•   Basic Factors
    •   Term Frequency in a document (↑ is better)
Scoring
•   score(q,d) =
    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))
                                 t in q




•   Basic Factors
    •   Term Frequency in a document (↑ is better)
    •   Term Frequency in Corpus (↓ is Better)
Scoring
•   score(q,d) =
    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))
                                 t in q




•   Basic Factors
    •   Term Frequency in a document (↑ is better)
    •   Term Frequency in Corpus (↓ is Better)
    •   Length of matching document (↓ is Better)
Scoring
•   score(q,d) =
    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))
                                  t in q




•   Basic Factors
    •   Term Frequency in a document (↑ is better)
    •   Term Frequency in Corpus (↓ is Better)
    •   Length of matching document (↓ is Better)
        •   “Jesus Wept” - John 11:35
Scoring
•   score(q,d) =
    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))
                                   t in q




•   Basic Factors
    •   Term Frequency in a document (↑ is better)
    •   Term Frequency in Corpus (↓ is Better)
    •   Length of matching document (↓ is Better)
        •   “Jesus Wept” - John 11:35
        •   http://localhost:8080/solr/bible3/select?q=wept
Scoring
•   score(q,d) =
    coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d))
                                   t in q




•   Basic Factors
    •   Term Frequency in a document (↑ is better)
    •   Term Frequency in Corpus (↓ is Better)
    •   Length of matching document (↓ is Better)
        •   “Jesus Wept” - John 11:35
        •   http://localhost:8080/solr/bible3/select?q=wept
•   http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/
    Similarity.html
Search Multiple
 Translations
Search Multiple
              Translations
•   Another way: Dismax
Search Multiple
              Translations
•   Another way: Dismax
•   Can score a document (verse) match based on scores/matches
    from multiple fields.
Search Multiple
              Translations
•   Another way: Dismax
•   Can score a document (verse) match based on scores/matches
    from multiple fields.
•   net_index^1 kjv_index^1
Search Multiple
                Translations
•   Another way: Dismax
•   Can score a document (verse) match based on scores/matches
    from multiple fields.
•   net_index^1 kjv_index^1
    •   Not exponents - weights
Search Multiple
                 Translations
•   Another way: Dismax
•   Can score a document (verse) match based on scores/matches
    from multiple fields.
•   net_index^1 kjv_index^1
    •   Not exponents - weights
    •   We’re searching the net_index and kjv_index fields, each with
        a boost/weight of 1.
Search Multiple
                 Translations
•   Another way: Dismax
•   Can score a document (verse) match based on scores/matches
    from multiple fields.
•   net_index^1 kjv_index^1
    •   Not exponents - weights
    •   We’re searching the net_index and kjv_index fields, each with
        a boost/weight of 1.
•   net_index^6 kjv_index^.5
Search Multiple
                     Translations
•   Another way: Dismax
•   Can score a document (verse) match based on scores/matches
    from multiple fields.
•   net_index^1 kjv_index^1
    •   Not exponents - weights
    •   We’re searching the net_index and kjv_index fields, each with
        a boost/weight of 1.
•   net_index^6 kjv_index^.5
•   http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=.
    1&qf=net_index^1%20kjv_index^1&fl=score
Search Multiple
                      Translations
•   Another way: Dismax
•   Can score a document (verse) match based on scores/matches
    from multiple fields.
•   net_index^1 kjv_index^1
    •   Not exponents - weights
    •   We’re searching the net_index and kjv_index fields, each with
        a boost/weight of 1.
•   net_index^6 kjv_index^.5
•   http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=.
    1&qf=net_index^1%20kjv_index^1&fl=score

•   http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=.
    1&qf=net_index^6%20kjv_index^.5&fl=score
Topic Tagging
Topic Tagging
• Use a topically-tagged Bible/concordance to mark-
  up each verse, or just key verses
Topic Tagging
• Use a topically-tagged Bible/concordance to mark-
  up each verse, or just key verses
• Helpful for “theme” based queries.
Topic Tagging
• Use a topically-tagged Bible/concordance to mark-
  up each verse, or just key verses
• Helpful for “theme” based queries.
 • “Social Justice” - no good matches
Topic Tagging
• Use a topically-tagged Bible/concordance to mark-
  up each verse, or just key verses
• Helpful for “theme” based queries.
 • “Social Justice” - no good matches
 • “Satan” - Many Names
Topic Tagging
• Use a topically-tagged Bible/concordance to mark-
  up each verse, or just key verses
• Helpful for “theme” based queries.
 • “Social Justice” - no good matches
 • “Satan” - Many Names
   • Name Tagging in general can be very helpful
Searching Strong’s
Searching Strong’s

• Add a field for Strong’s: strongs_index
Searching Strong’s

• Add a field for Strong’s: strongs_index
•   1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756
    2316 3498 235 2198
Searching Strong’s

• Add a field for Strong’s: strongs_index
•   1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756
    2316 3498 235 2198

• Most of the benefits of text searching
Searching Strong’s

• Add a field for Strong’s: strongs_index
•   1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756
    2316 3498 235 2198

• Most of the benefits of text searching
 • “Word” frequency
Searching Strong’s

• Add a field for Strong’s: strongs_index
•   1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756
    2316 3498 235 2198

• Most of the benefits of text searching
 • “Word” frequency
 • Document vs. corpus frequency of search terms
Searching Articles
Searching Articles
• Similar approach to text-based queries
Searching Articles
• Similar approach to text-based queries
 • Stem words
Searching Articles
• Similar approach to text-based queries
 • Stem words
 • Use Synonyms
Searching Articles
• Similar approach to text-based queries
 • Stem words
 • Use Synonyms
 • Remove Stop Words
Searching Articles
• Similar approach to text-based queries
 • Stem words
 • Use Synonyms
 • Remove Stop Words
• Without manual tagging, there’s no automatic way
  to index/search by Bible Reference
Searching Articles
Searching Articles

• Article contains reference: “John 3”
Searching Articles

• Article contains reference: “John 3”
• User searches for “John 3:16” or “John 2-4”
Searching Articles

• Article contains reference: “John 3”
• User searches for “John 3:16” or “John 2-4”
• Results: no meaningful matches at best
  (unless the documents match the query
  “John”
Searching Articles
Searching Articles
• Solr-based Solutions:
Searching Articles
• Solr-based Solutions:
 • Identify and index references and their
    composite verses using a grammar.
Searching Articles
• Solr-based Solutions:
 • Identify and index references and their
    composite verses using a grammar.
 • John 1:1-3 -> John 1:1; John 1:2; John 1:3
Searching Articles
• Solr-based Solutions:
 • Identify and index references and their
    composite verses using a grammar.
 • John 1:1-3 -> John 1:1; John 1:2; John 1:3
 • Store in a multivalued field - each
    reference is a “term”
Searching Articles
• Solr-based Solutions:
 • Identify and index references and their
    composite verses using a grammar.
 • John 1:1-3 -> John 1:1; John 1:2; John 1:3
 • Store in a multivalued field - each
    reference is a “term”
 • Must also parse and expand references in
    queries in order to match
Searching Articles
Searching Articles
•   Relational database-based solution:
Searching Articles
•   Relational database-based solution:
    •   Assign an id to every verse
Searching Articles
•   Relational database-based solution:
    •   Assign an id to every verse
    •   Store: id, articleId, verseId
Searching Articles
•   Relational database-based solution:
    •   Assign an id to every verse
    •   Store: id, articleId, verseId
    •   Parse user query to ids.
Searching Articles
•   Relational database-based solution:
    •   Assign an id to every verse
    •   Store: id, articleId, verseId
    •   Parse user query to ids.
    •   SELECT COUNT(id)
        WHERE verseId IN (ID_LIST)
        GROUP BY articleId
Searching Articles
•   Relational database-based solution:
    •   Assign an id to every verse
    •   Store: id, articleId, verseId
    •   Parse user query to ids.
    •   SELECT COUNT(id)
        WHERE verseId IN (ID_LIST)
        GROUP BY articleId
        •   Higher count -> Article is most likely to me more
            about that reference than other articles with a
            lower count
Searching Articles
Searching Articles
• Relational database-based solution:
Searching Articles
• Relational database-based solution:
 • Large amount of rows.
Searching Articles
• Relational database-based solution:
 • Large amount of rows.
 • 15,000 Journal articles have > 9,000,000 rows
    (verse occurrences)
Searching Articles
• Relational database-based solution:
 • Large amount of rows.
 • 15,000 Journal articles have > 9,000,000 rows
     (verse occurrences)
 •   Can store id, articleId, verseId, count
Searching Articles
• Relational database-based solution:
 • Large amount of rows.
 • 15,000 Journal articles have > 9,000,000 rows
     (verse occurrences)
 •   Can store id, articleId, verseId, count
     • Then SUM() the counts for each articleId.
Searching Articles
• Relational database-based solution:
 • Large amount of rows.
 • 15,000 Journal articles have > 9,000,000 rows
     (verse occurrences)
 •   Can store id, articleId, verseId, count
     • Then SUM() the counts for each articleId.
     • Negligibly faster.
Searching Articles
• Relational database-based solution:
 • Large amount of rows.
 • 15,000 Journal articles have > 9,000,000 rows
     (verse occurrences)
 •   Can store id, articleId, verseId, count
     • Then SUM() the counts for each articleId.
     • Negligibly faster.
     • Only approx. 3,000,000 rows
Heterogeneous Indexes
Heterogeneous Indexes
•   All content is not created equally.
Heterogeneous Indexes
•   All content is not created equally.
•   Content quality and its affect on the quality of
    your results becomes a factor when you move
    from one resource to > one
Heterogeneous Indexes
•   All content is not created equally.
•   Content quality and its affect on the quality of
    your results becomes a factor when you move
    from one resource to > one
    •   One Bible, One website, One Journal
Heterogeneous Indexes
•   All content is not created equally.
•   Content quality and its affect on the quality of
    your results becomes a factor when you move
    from one resource to > one
    •   One Bible, One website, One Journal
•   Apply a field or document boost to help
    normalize results
Heterogeneous Indexes
•   All content is not created equally.
•   Content quality and its affect on the quality of
    your results becomes a factor when you move
    from one resource to > one
    •   One Bible, One website, One Journal
•   Apply a field or document boost to help
    normalize results
•   Some content gets bumped up and some down
Search in the Biblical Domain - BibleTech: 2011
Search in the Biblical Domain - BibleTech: 2011

Más contenido relacionado

La actualidad más candente

La actualidad más candente (18)

Html
HtmlHtml
Html
 
Solr basedsearch
Solr basedsearchSolr basedsearch
Solr basedsearch
 
Unit 3 (it workshop).pptx
Unit 3 (it workshop).pptxUnit 3 (it workshop).pptx
Unit 3 (it workshop).pptx
 
SPARQL in the Semantic Web
SPARQL in the Semantic WebSPARQL in the Semantic Web
SPARQL in the Semantic Web
 
Learning sparql 2012 12
Learning sparql 2012 12Learning sparql 2012 12
Learning sparql 2012 12
 
Basic-CSS-tutorial
Basic-CSS-tutorialBasic-CSS-tutorial
Basic-CSS-tutorial
 
Xml
XmlXml
Xml
 
Understanding Taxonomy, Drupal Camp Colorado, June 2009
Understanding Taxonomy, Drupal Camp Colorado, June 2009Understanding Taxonomy, Drupal Camp Colorado, June 2009
Understanding Taxonomy, Drupal Camp Colorado, June 2009
 
Basic css
Basic cssBasic css
Basic css
 
Ruby data types
Ruby data typesRuby data types
Ruby data types
 
Sage Research Method Online
Sage Research Method OnlineSage Research Method Online
Sage Research Method Online
 
Taking document management beyond content types
Taking document management beyond content typesTaking document management beyond content types
Taking document management beyond content types
 
Css
CssCss
Css
 
computer language - html lists
computer language - html listscomputer language - html lists
computer language - html lists
 
Introduction to html
Introduction to htmlIntroduction to html
Introduction to html
 
BISG DOI Overview
BISG DOI OverviewBISG DOI Overview
BISG DOI Overview
 
DOIs for Book Publishers
DOIs for Book PublishersDOIs for Book Publishers
DOIs for Book Publishers
 
Zotero according to Jessica
Zotero according to JessicaZotero according to Jessica
Zotero according to Jessica
 

Destacado

BALLET NACIONAL
BALLET NACIONALBALLET NACIONAL
BALLET NACIONALNietzsche
 
The will to power
The will to powerThe will to power
The will to powerJe Escober
 
Platon (eflatun)
Platon (eflatun)Platon (eflatun)
Platon (eflatun)sevays067
 
Nietzsche Kimdir?
Nietzsche Kimdir?Nietzsche Kimdir?
Nietzsche Kimdir?SlaytSunum
 
FRIEDRICH NIETZSCHE
FRIEDRICH NIETZSCHEFRIEDRICH NIETZSCHE
FRIEDRICH NIETZSCHEBiqqi Amir
 
Nietzsche's prominent works and God is dead.
Nietzsche's prominent works and God is dead. Nietzsche's prominent works and God is dead.
Nietzsche's prominent works and God is dead. Sherina Noor
 
Drupal 8 + Elasticsearch + Docker
Drupal 8 + Elasticsearch + DockerDrupal 8 + Elasticsearch + Docker
Drupal 8 + Elasticsearch + DockerRoald Umandal
 
An Analysis and Interpretation of Plato's Allegory of the Cave
An Analysis and Interpretation of Plato's Allegory of the CaveAn Analysis and Interpretation of Plato's Allegory of the Cave
An Analysis and Interpretation of Plato's Allegory of the Caveguest71fae1
 
Plato’s allegory
Plato’s allegoryPlato’s allegory
Plato’s allegoryMg Hariharan
 
Allegory of the Cave
Allegory of the CaveAllegory of the Cave
Allegory of the Caveellie_rowan
 
Thomas Aquinas
Thomas AquinasThomas Aquinas
Thomas AquinasYasirSamad
 
Nietzsche's Philosophies
Nietzsche's Philosophies Nietzsche's Philosophies
Nietzsche's Philosophies Waleed Liaqat
 

Destacado (20)

BALLET NACIONAL
BALLET NACIONALBALLET NACIONAL
BALLET NACIONAL
 
The will to power
The will to powerThe will to power
The will to power
 
Platon (eflatun)
Platon (eflatun)Platon (eflatun)
Platon (eflatun)
 
Nietzsche Kimdir?
Nietzsche Kimdir?Nietzsche Kimdir?
Nietzsche Kimdir?
 
FRIEDRICH NIETZSCHE
FRIEDRICH NIETZSCHEFRIEDRICH NIETZSCHE
FRIEDRICH NIETZSCHE
 
Saint Thomas Aquinas PHilosophy
Saint Thomas Aquinas PHilosophySaint Thomas Aquinas PHilosophy
Saint Thomas Aquinas PHilosophy
 
Nietzsche's prominent works and God is dead.
Nietzsche's prominent works and God is dead. Nietzsche's prominent works and God is dead.
Nietzsche's prominent works and God is dead.
 
Drupal 8 + Elasticsearch + Docker
Drupal 8 + Elasticsearch + DockerDrupal 8 + Elasticsearch + Docker
Drupal 8 + Elasticsearch + Docker
 
An Analysis and Interpretation of Plato's Allegory of the Cave
An Analysis and Interpretation of Plato's Allegory of the CaveAn Analysis and Interpretation of Plato's Allegory of the Cave
An Analysis and Interpretation of Plato's Allegory of the Cave
 
Plato’s allegory
Plato’s allegoryPlato’s allegory
Plato’s allegory
 
Allegory of the Cave
Allegory of the CaveAllegory of the Cave
Allegory of the Cave
 
Nietzsche, genio y figura
Nietzsche, genio y figuraNietzsche, genio y figura
Nietzsche, genio y figura
 
Nietzsche
NietzscheNietzsche
Nietzsche
 
Thomas Aquinas
Thomas AquinasThomas Aquinas
Thomas Aquinas
 
Nietzsche
NietzscheNietzsche
Nietzsche
 
Nietzsche's Philosophies
Nietzsche's Philosophies Nietzsche's Philosophies
Nietzsche's Philosophies
 
Socrates
SocratesSocrates
Socrates
 
Socrates Philosophy
Socrates PhilosophySocrates Philosophy
Socrates Philosophy
 
Nietzsche.ppt
Nietzsche.pptNietzsche.ppt
Nietzsche.ppt
 
St. Thomas Aquinas Philosophy
St. Thomas Aquinas PhilosophySt. Thomas Aquinas Philosophy
St. Thomas Aquinas Philosophy
 

Similar a Search in the Biblical Domain - BibleTech: 2011

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrJayesh Bhoyar
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Library Mashups & APIs
Library Mashups & APIsLibrary Mashups & APIs
Library Mashups & APIslibrarywebchic
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorialChris Huang
 
Schema.org: What It Means For You and Your Library
Schema.org: What It Means For You and Your LibrarySchema.org: What It Means For You and Your Library
Schema.org: What It Means For You and Your LibraryRichard Wallis
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr WorkshopJSGB
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solrsagar chaturvedi
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with LuceneWO Community
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Mary Jo Sminkey
 

Similar a Search in the Biblical Domain - BibleTech: 2011 (20)

Apache Solr
Apache SolrApache Solr
Apache Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Make Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 MinutesMake Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 Minutes
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Apache solr
Apache solrApache solr
Apache solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene intro
Lucene introLucene intro
Lucene intro
 
Library Mashups & APIs
Library Mashups & APIsLibrary Mashups & APIs
Library Mashups & APIs
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Schema.org: What It Means For You and Your Library
Schema.org: What It Means For You and Your LibrarySchema.org: What It Means For You and Your Library
Schema.org: What It Means For You and Your Library
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)
 

Último

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Último (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Search in the Biblical Domain - BibleTech: 2011

  • 1. Search in the Biblical Domain Brian Seagraves (Bible.org)
  • 3. What is “Search”? • Information/Document Retrieval
  • 4. What is “Search”? • Information/Document Retrieval • Basic Definition:
  • 5. What is “Search”? • Information/Document Retrieval • Basic Definition: • Finding previously seen documents that are related to some user-supplied terms.
  • 6. What is “Search”? • Information/Document Retrieval • Basic Definition: • Finding previously seen documents that are related to some user-supplied terms. • Advanced Definition:
  • 7. What is “Search”? • Information/Document Retrieval • Basic Definition: • Finding previously seen documents that are related to some user-supplied terms. • Advanced Definition: • Finding relevant content for some query by understanding the contextual meaning of terms in the search index and query.
  • 8. What is “Search”? • Information/Document Retrieval • Basic Definition: • Finding previously seen documents that are related to some user-supplied terms. • Advanced Definition: • Finding relevant content for some query by understanding the contextual meaning of terms in the search index and query. • Semantic Search
  • 9. Types and Sources of Content
  • 10. Types and Sources of Content • The Bible and its verses
  • 11. Types and Sources of Content • The Bible and its verses • Articles, Journals, and other extra-biblical content
  • 12. Types and Sources of Content • The Bible and its verses • Articles, Journals, and other extra-biblical content • The web
  • 14. Information Retrieval Engines • Sphinx - http://sphinxsearch.com
  • 15. Information Retrieval Engines • Sphinx - http://sphinxsearch.com • Lucene - http://lucene.apache.org/
  • 16. Information Retrieval Engines • Sphinx - http://sphinxsearch.com • Lucene - http://lucene.apache.org/ • Solr - http://lucene.apache.org/solr/
  • 17. Information Retrieval Engines • Sphinx - http://sphinxsearch.com • Lucene - http://lucene.apache.org/ • Solr - http://lucene.apache.org/solr/ • MySQL Fulltext Search - kinda
  • 18. Solr
  • 20. Solr • Open Source • Full-text search
  • 21. Solr • Open Source • Full-text search • Hit Highlighting
  • 22. Solr • Open Source • Full-text search • Hit Highlighting • Facets
  • 23. Solr • Open Source • Full-text search • Hit Highlighting • Facets • Java
  • 24. Solr • Open Source • Full-text search • Hit Highlighting • Facets • Java • REST-like HTTP/XML and JSON APIs
  • 26. Solr Documents • A document represents a distinct piece of content that can be stored/retrieved
  • 27. Solr Documents • A document represents a distinct piece of content that can be stored/retrieved • Bible Verse
  • 28. Solr Documents • A document represents a distinct piece of content that can be stored/retrieved • Bible Verse • Journal Article
  • 29. Solr Documents • A document represents a distinct piece of content that can be stored/retrieved • Bible Verse • Journal Article • Commentary Chapter/Section
  • 30. Solr Documents • A document represents a distinct piece of content that can be stored/retrieved • Bible Verse • Journal Article • Commentary Chapter/Section • Web Page
  • 32. Solr Documents • Documents have one or more Fields
  • 33. Solr Documents • Documents have one or more Fields • Fields Have types
  • 34. Solr Documents • Documents have one or more Fields • Fields Have types • Integer
  • 35. Solr Documents • Documents have one or more Fields • Fields Have types • Integer • Float
  • 36. Solr Documents • Documents have one or more Fields • Fields Have types • Integer • Float • String
  • 37. Solr Documents • Documents have one or more Fields • Fields Have types • Integer • Float • String • Text
  • 38. Solr Documents • Documents have one or more Fields • Fields Have types • Integer • Float • String • Text • Date
  • 39. Solr Documents • Documents have one or more Fields • Fields Have types • Integer • Float • String • Text • Date • and More!
  • 41. Solr Fields • Field Types can have:
  • 42. Solr Fields • Field Types can have: • Filters
  • 43. Solr Fields • Field Types can have: • Filters • Remove parts of the content
  • 44. Solr Fields • Field Types can have: • Filters • Remove parts of the content • Tokenizers
  • 45. Solr Fields • Field Types can have: • Filters • Remove parts of the content • Tokenizers • Split content into chunks/tokens
  • 47. Solr Fields • The “String” Field Type
  • 48. Solr Fields • The “String” Field Type • <fieldType name="string" class="solr.StrField" />
  • 49. Solr Fields • The “String” Field Type • <fieldType name="string" class="solr.StrField" /> • No Filter; No Tokenizer
  • 50. Solr Fields • The “String” Field Type • <fieldType name="string" class="solr.StrField" /> • No Filter; No Tokenizer • Field content won’t be split or changed
  • 51. <fieldtype name="html_text" class="solr.TextField" > <analyzer type="index"> <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" /> <filter class="solr.StopFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldtype>
  • 52. Sample Schema (cont.) <fieldtype name="sint" class="solr.SortableIntField" omitNorms="true" /> <fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
  • 53. Sample Schema (cont.) <fields> <field name="id" type="sint" indexed="true" stored="true" multiValued="false" /> <field name="abbr" type="string" indexed="true" stored="true" multiValued="false" /> <field name="name" type="string" indexed="true" stored="true" multiValued="false" /> <field name="book" type="sint" indexed="true" stored="true" multiValued="false" /> <field name="chapter" type="sint" indexed="true" stored="true" multiValued="false" /> <field name="verse" type="sint" indexed="true" stored="true" multiValued="false" /> <field name="ot_nt" type="string" indexed="true" stored="true" multiValued="false" /> <field name="net" type="text" indexed="false" stored="true" multiValued="false" /> <field name="all_index" type="html_text" indexed="true" stored="false" /> </fields> <copyField source="net" dest="all_index" /> <uniqueKey>id</uniqueKey> <defaultSearchField>all_index</defaultSearchField> <solrQueryParser defaultOperator="OR" />
  • 54. Put Data in Solr
  • 55. Put Data in Solr • Remember, Solr communicates using XML over HTTP
  • 56. Put Data in Solr • Remember, Solr communicates using XML over HTTP • No concept of updating a document - delete, then add
  • 57. Put Data in Solr • Remember, Solr communicates using XML over HTTP • No concept of updating a document - delete, then add • To add, POST XML to update handler
  • 58. Put Data in Solr • Remember, Solr communicates using XML over HTTP • No concept of updating a document - delete, then add • To add, POST XML to update handler • http://localhost:8080/solr/bible/update
  • 59. Add XML <add> <doc> <id>1</id> <net>In the beginning God created the heavens and the earth.</net> </doc> </add>
  • 60. PHP API • No XML! • $client = new SolrClient($options); $doc = new SolrInputDocument(); $doc->addField('id', 1); //Must be Integer $doc->addField('net', ‘In the beginning God created the heavens and the earth.’); $client->addDocument($doc);
  • 62. Querying Solr • HTTP GET Request
  • 63. Querying Solr • HTTP GET Request • http://localhost:8080/solr/bible3/select?q=god
  • 64. Querying Solr • HTTP GET Request • http://localhost:8080/solr/bible3/select?q=god • | Path to Solr ||Core||Handler||Query |
  • 65. Querying Solr • HTTP GET Request • http://localhost:8080/solr/bible3/select?q=god • | Path to Solr ||Core||Handler||Query | • Returns XML By Default
  • 66. Querying Solr • HTTP GET Request • http://localhost:8080/solr/bible3/select?q=god • | Path to Solr ||Core||Handler||Query | • Returns XML By Default • Can return JSON and more
  • 68. Querying Solr • Queries the defaultSearchField by default
  • 69. Querying Solr • Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField>
  • 70. Querying Solr • Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField> • Can query other fields by using the syntax:field:value
  • 71. Querying Solr • Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField> • Can query other fields by using the syntax:field:value • http://localhost:8080/solr/bible3/select?q=id:27974
  • 72. Querying Solr • Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField> • Can query other fields by using the syntax:field:value • http://localhost:8080/solr/bible3/select?q=id:27974 • Multiple queries / Booleans
  • 73. Querying Solr • Queries the defaultSearchField by default • <defaultSearchField>all_index</defaultSearchField> • Can query other fields by using the syntax:field:value • http://localhost:8080/solr/bible3/select?q=id:27974 • Multiple queries / Booleans • http://localhost:8080/solr/bible3/select?q=god AND book:40
  • 75. Search Multiple Translations (Fields) • Let’s add some fields: kjv and kjv_index
  • 76. Search Multiple Translations (Fields) • Let’s add some fields: kjv and kjv_index • Add some copy field directives: <copyField source="kjv" dest="all_index" /> <copyField source="kjv" dest="kjv_index" />
  • 77. Search Multiple Translations (Fields) • Let’s add some fields: kjv and kjv_index • Add some copy field directives: <copyField source="kjv" dest="all_index" /> <copyField source="kjv" dest="kjv_index" /> • Query: “Shew Thyself”
  • 78. Search Multiple Translations (Fields) • Let’s add some fields: kjv and kjv_index • Add some copy field directives: <copyField source="kjv" dest="all_index" /> <copyField source="kjv" dest="kjv_index" /> • Query: “Shew Thyself” • 0 Results in the NET http://localhost:8080/solr/bible3/select?q=shew%20theyself
  • 79. Search Multiple Translations (Fields) • Let’s add some fields: kjv and kjv_index • Add some copy field directives: <copyField source="kjv" dest="all_index" /> <copyField source="kjv" dest="kjv_index" /> • Query: “Shew Thyself” • 0 Results in the NET http://localhost:8080/solr/bible3/select?q=shew%20theyself • 360 Results in the Combined index/field http://localhost:8080/solr/bible4/select?q=shew%20theyself
  • 81. Search Multiple Translations • + Quasi Synonym term/phrase injection
  • 82. Search Multiple Translations • + Quasi Synonym term/phrase injection • + Less variation across translations leads to stronger possible matches
  • 83. Search Multiple Translations • + Quasi Synonym term/phrase injection • + Less variation across translations leads to stronger possible matches • + Matches verses when the source translation isn’t known
  • 84. Search Multiple Translations • + Quasi Synonym term/phrase injection • + Less variation across translations leads to stronger possible matches • + Matches verses when the source translation isn’t known • - No control over which translation gets more weight
  • 85. Search Multiple Translations • + Quasi Synonym term/phrase injection • + Less variation across translations leads to stronger possible matches • + Matches verses when the source translation isn’t known • - No control over which translation gets more weight • - No control over scoring of matches
  • 86. Search Multiple Translations • Another way: Dismax • Can score a document (verse) match based on scores/matches from multiple fields. • net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1. • net_index^6 kjv_index^.5 • http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^1%20kjv_index^1&fl=score • http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^6%20kjv_index^.5&fl=score
  • 88. Scoring • score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q
  • 89. Scoring • score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q • Basic Factors
  • 90. Scoring • score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q • Basic Factors • Term Frequency in a document (↑ is better)
  • 91. Scoring • score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q • Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better)
  • 92. Scoring • score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q • Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better) • Length of matching document (↓ is Better)
  • 93. Scoring • score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q • Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better) • Length of matching document (↓ is Better) • “Jesus Wept” - John 11:35
  • 94. Scoring • score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q • Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better) • Length of matching document (↓ is Better) • “Jesus Wept” - John 11:35 • http://localhost:8080/solr/bible3/select?q=wept
  • 95. Scoring • score(q,d) = coord(q,d)· queryNorm(q)· ∑ ( tf(t in d)· idf(t)2·  norm(t,d)) t in q • Basic Factors • Term Frequency in a document (↑ is better) • Term Frequency in Corpus (↓ is Better) • Length of matching document (↓ is Better) • “Jesus Wept” - John 11:35 • http://localhost:8080/solr/bible3/select?q=wept • http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/ Similarity.html
  • 97. Search Multiple Translations • Another way: Dismax
  • 98. Search Multiple Translations • Another way: Dismax • Can score a document (verse) match based on scores/matches from multiple fields.
  • 99. Search Multiple Translations • Another way: Dismax • Can score a document (verse) match based on scores/matches from multiple fields. • net_index^1 kjv_index^1
  • 100. Search Multiple Translations • Another way: Dismax • Can score a document (verse) match based on scores/matches from multiple fields. • net_index^1 kjv_index^1 • Not exponents - weights
  • 101. Search Multiple Translations • Another way: Dismax • Can score a document (verse) match based on scores/matches from multiple fields. • net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1.
  • 102. Search Multiple Translations • Another way: Dismax • Can score a document (verse) match based on scores/matches from multiple fields. • net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1. • net_index^6 kjv_index^.5
  • 103. Search Multiple Translations • Another way: Dismax • Can score a document (verse) match based on scores/matches from multiple fields. • net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1. • net_index^6 kjv_index^.5 • http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^1%20kjv_index^1&fl=score
  • 104. Search Multiple Translations • Another way: Dismax • Can score a document (verse) match based on scores/matches from multiple fields. • net_index^1 kjv_index^1 • Not exponents - weights • We’re searching the net_index and kjv_index fields, each with a boost/weight of 1. • net_index^6 kjv_index^.5 • http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^1%20kjv_index^1&fl=score • http://localhost:8080/solr/bible4/select?q=respect%20for%20god&defType=dismax&tie=. 1&qf=net_index^6%20kjv_index^.5&fl=score
  • 106. Topic Tagging • Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses
  • 107. Topic Tagging • Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses • Helpful for “theme” based queries.
  • 108. Topic Tagging • Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses • Helpful for “theme” based queries. • “Social Justice” - no good matches
  • 109. Topic Tagging • Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses • Helpful for “theme” based queries. • “Social Justice” - no good matches • “Satan” - Many Names
  • 110. Topic Tagging • Use a topically-tagged Bible/concordance to mark- up each verse, or just key verses • Helpful for “theme” based queries. • “Social Justice” - no good matches • “Satan” - Many Names • Name Tagging in general can be very helpful
  • 112. Searching Strong’s • Add a field for Strong’s: strongs_index
  • 113. Searching Strong’s • Add a field for Strong’s: strongs_index • 1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756 2316 3498 235 2198
  • 114. Searching Strong’s • Add a field for Strong’s: strongs_index • 1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756 2316 3498 235 2198 • Most of the benefits of text searching
  • 115. Searching Strong’s • Add a field for Strong’s: strongs_index • 1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756 2316 3498 235 2198 • Most of the benefits of text searching • “Word” frequency
  • 116. Searching Strong’s • Add a field for Strong’s: strongs_index • 1473 1510 2316 11 2316 2464 2532 2316 2384 1510 3756 2316 3498 235 2198 • Most of the benefits of text searching • “Word” frequency • Document vs. corpus frequency of search terms
  • 118. Searching Articles • Similar approach to text-based queries
  • 119. Searching Articles • Similar approach to text-based queries • Stem words
  • 120. Searching Articles • Similar approach to text-based queries • Stem words • Use Synonyms
  • 121. Searching Articles • Similar approach to text-based queries • Stem words • Use Synonyms • Remove Stop Words
  • 122. Searching Articles • Similar approach to text-based queries • Stem words • Use Synonyms • Remove Stop Words • Without manual tagging, there’s no automatic way to index/search by Bible Reference
  • 124. Searching Articles • Article contains reference: “John 3”
  • 125. Searching Articles • Article contains reference: “John 3” • User searches for “John 3:16” or “John 2-4”
  • 126. Searching Articles • Article contains reference: “John 3” • User searches for “John 3:16” or “John 2-4” • Results: no meaningful matches at best (unless the documents match the query “John”
  • 129. Searching Articles • Solr-based Solutions: • Identify and index references and their composite verses using a grammar.
  • 130. Searching Articles • Solr-based Solutions: • Identify and index references and their composite verses using a grammar. • John 1:1-3 -> John 1:1; John 1:2; John 1:3
  • 131. Searching Articles • Solr-based Solutions: • Identify and index references and their composite verses using a grammar. • John 1:1-3 -> John 1:1; John 1:2; John 1:3 • Store in a multivalued field - each reference is a “term”
  • 132. Searching Articles • Solr-based Solutions: • Identify and index references and their composite verses using a grammar. • John 1:1-3 -> John 1:1; John 1:2; John 1:3 • Store in a multivalued field - each reference is a “term” • Must also parse and expand references in queries in order to match
  • 134. Searching Articles • Relational database-based solution:
  • 135. Searching Articles • Relational database-based solution: • Assign an id to every verse
  • 136. Searching Articles • Relational database-based solution: • Assign an id to every verse • Store: id, articleId, verseId
  • 137. Searching Articles • Relational database-based solution: • Assign an id to every verse • Store: id, articleId, verseId • Parse user query to ids.
  • 138. Searching Articles • Relational database-based solution: • Assign an id to every verse • Store: id, articleId, verseId • Parse user query to ids. • SELECT COUNT(id) WHERE verseId IN (ID_LIST) GROUP BY articleId
  • 139. Searching Articles • Relational database-based solution: • Assign an id to every verse • Store: id, articleId, verseId • Parse user query to ids. • SELECT COUNT(id) WHERE verseId IN (ID_LIST) GROUP BY articleId • Higher count -> Article is most likely to me more about that reference than other articles with a lower count
  • 141. Searching Articles • Relational database-based solution:
  • 142. Searching Articles • Relational database-based solution: • Large amount of rows.
  • 143. Searching Articles • Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences)
  • 144. Searching Articles • Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences) • Can store id, articleId, verseId, count
  • 145. Searching Articles • Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences) • Can store id, articleId, verseId, count • Then SUM() the counts for each articleId.
  • 146. Searching Articles • Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences) • Can store id, articleId, verseId, count • Then SUM() the counts for each articleId. • Negligibly faster.
  • 147. Searching Articles • Relational database-based solution: • Large amount of rows. • 15,000 Journal articles have > 9,000,000 rows (verse occurrences) • Can store id, articleId, verseId, count • Then SUM() the counts for each articleId. • Negligibly faster. • Only approx. 3,000,000 rows
  • 149. Heterogeneous Indexes • All content is not created equally.
  • 150. Heterogeneous Indexes • All content is not created equally. • Content quality and its affect on the quality of your results becomes a factor when you move from one resource to > one
  • 151. Heterogeneous Indexes • All content is not created equally. • Content quality and its affect on the quality of your results becomes a factor when you move from one resource to > one • One Bible, One website, One Journal
  • 152. Heterogeneous Indexes • All content is not created equally. • Content quality and its affect on the quality of your results becomes a factor when you move from one resource to > one • One Bible, One website, One Journal • Apply a field or document boost to help normalize results
  • 153. Heterogeneous Indexes • All content is not created equally. • Content quality and its affect on the quality of your results becomes a factor when you move from one resource to > one • One Bible, One website, One Journal • Apply a field or document boost to help normalize results • Some content gets bumped up and some down

Notas del editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n
  81. \n
  82. \n
  83. \n
  84. \n
  85. \n
  86. \n
  87. \n
  88. \n
  89. \n
  90. \n
  91. \n
  92. \n
  93. \n
  94. \n
  95. \n
  96. \n
  97. \n
  98. \n
  99. \n
  100. \n
  101. \n
  102. \n
  103. \n
  104. \n
  105. \n
  106. \n
  107. \n
  108. \n
  109. \n
  110. \n
  111. \n
  112. \n
  113. \n
  114. \n
  115. \n
  116. \n
  117. \n
  118. \n
  119. \n
  120. \n
  121. \n
  122. \n
  123. \n
  124. \n
  125. \n
  126. \n
  127. \n
  128. \n
  129. \n
  130. \n
  131. \n