SlideShare una empresa de Scribd logo
1 de 55
Descargar para leer sin conexión
Multi-faceted responsive search,
autocomplete, feeds engine and logging
Remi Mikalsen
Search Engineer, utdanning.no
Multi-facetedMulti-faceted
responsive search,responsive search,
autocomplete,autocomplete,
feeds engine andfeeds engine and
logginglogging
Introduction
Remi Mikalsen
Search engineer, utdanning.no
«Utdanning.no is the official Norwegian national education and
career portal, and includes an overview of education in Norway
and more than 500 career descriptions» - utdanning.no
« [...] Our main goals are to improve the quality of education and
to improve learning outcomes and learning for children, pupils
and students thourgh use of ICT in education» - iktsenteret.no
utdanning.no
Drupal 7 & Solr 3.6
~3 million visitors / year
~12,000 documents
~18,000,000 terms
~260 fields
~1 QPS (~9M searches / year)
~8 ms latency
Data integration in the CMS
Universities, colleges and
community colleges
~30 different endpoints
~3500 documents
Folk high schools
(non-academic)
1 national endpoint
~650 documents
Secondary schools
1 national endpoint
~1100 documents
Higher education admissions
(Samordna opptak)
1 national endpoint
~1500 documents
Secondary schools
metadata (Grep)
1 national endpoint
~650 documents
Higher education
metadata (NUS)
1 national endpoint
~3500 documents
Transform &
normalize
Drupal 7
ER-model
Added value
Editorial staff
Professions, interviews,
education summaries, etc.
~1500 documents
Professions metadata
(STYRK)
2 national endpoints
~1000 documents
Fetch data
Solr 3.6
De-normalized
Searchable
Indexing
Drupal 7
Apache Solr Search
Integration 7.x-1.1
Customized
business logic
Solr 3.6
Pros
Basic Drupal integration
Track document changes
Some facet support
Easily extendable
Cons
Lacks deep introspecting
Little de-normalization
Hacky hierarchies (Drupal)
Note
Custom config files!
schema.xml
(mainly dynamic fields)
solrconfig.xml
(mainly a drupal request handler)
We added
Deep introspecting
Data de-normalization
Solid hierarchy support
Pivot facet support
Atomization
Manual partial re-index
schema.xml
- field types (auto)
- various copy fields
- better spell
- bucket fields
- autocomplete
Organization
(school)
Study programStudy program
Study program
Organization
(school)
+
all its
Study programs
Drupal DB Solr documents
Study program
+
Organization
<doc>
<str name="id">394353</str>
<bool name="bs_mainsearch">true</bool>
<str name="bundle">org</str>
<str name="bundle_name">Organization</str>
<str name="label">ACME University</str>
<str name="atom">[XML]</str>
<arr name="related_nodes">
<str>ACME Rocket Science</str>
<str>Study program 2</str>
<str>Study program N</str>
</arr>
<arr name="sm_geography_hierarchy">
<str>1>California</str>
<str>2>California>San Diego</str>
<str>3>California>San Diego>Gaslamp Quarter</str>
</arr>
<str name="ss_menu_1">orgmenu</str>
<str name="ss_menu_2">org</str>
</doc>
<doc>
<str name="id">394354</str>
<bool name="bs_mainsearch">true</bool>
<str name="bundle">he</str>
<str name="bundle_name">Higher Education</str>
<str name="label">ACME Rocket Science</str>
<str name="atom">[XML]</str>
<arr name="sm_offered_by">
<str>ACME University</str>
</arr>
<arr name="sm_study_area">
<str>Engineering</str>
<str>Science</str>
</arr>
<long name="its_field_semesters">8</long>
<str name="ss_menu_1">edumenu</str>
<str name="ss_menu_2">he</str>
</doc>
Searching
- Site search
- Embedded search
- Feeds engine
Site search
Our goal
Students, councelors and teachers must find what they look for
How?
- Interaction design (IxD) vs graphical design
- User testing, user testing and user testing (and experience)
- Resulting in a GUI specification we must implement
Ajax-Solr is our JS framework:
https://github.com/evolvingweb/ajax-solr/wiki/reuters-tutorial
- manages all querying
- widgets for interaction with and displaying results
- events fire search requests which updates widgets
We extended it heavily
- Developed all our widgets (10+)
- Added logging (async, via ajax, local and GA)
- Distributed configuration (server + client)
- Simplified initialization script
But it also works out of the box!
Logger
~200 lines
JS library
~1700 lines
Solr 3.6
Our Website
Solr proxy
~85 lines
ajax-solr
evolvingweb
SolrPhpClient
r60
Default config
Initialize
(config)
JS library
(copy)Search
ACME Engineering
Lorum sollicitudin nunc id nibh
blandit pellentesque ipsum.
ACME Law
Cras nunc id nibh blandit
pellentesque sollicitudin.
ACME Med
Ipsum ollicitudin nunc id blandit
nibh pellentesque nibh.
- Include JS library
- Initialize
- Set up HTML
- Search! (and log)
Site search – widgets & faceting
Ajax Solr allows defining N widgets
«Everything» is a widget
A facet is an instance of a FacetWidget
Interaction with widgets may fire query
All facetation is piped into one query
All widgets are updated after Solr response
Some facet widgets we have developed
- Plain
Facet values and facet counts in a list
Multiple (AND) or single choice
- Hierarchical
Facet values and facet counts in a list
Clicking on a facet value drills down into the hierarchy; facet.prefix + fq
- Dropdown
Displays facet values in a dropdown list
Useful for mobile devices in our responsive theme
- Tagcloud
Facet values in a tagcloud
- Pivot facet
Our menu system
Adding facets
Config
facets['interests'] = new facetobject('tagcloud', 'field_interests', '#interests');
facets['ispublic'] = new facetobject('plain', 'field_ispublic', '#ispublic');
config['facets'] = facets;
HTML
<ul id="interests"></ul>
<ul id="ispublic"></ul>
INITIALIZE
Manager.addFacets(config);
Example widget code
AjaxSolr.PlainFacetWidget = AjaxSolr.AbstractFacetWidget.extend({
multivalue: true,
target: null, // HTML target id
field: null, // Solr-field
facet_display_limit: 5, // Max facets to display before «See more»
facet_field_sort: null, // Optional facet sort
dependencies: null, // Conditional display of facet
facet_display_more: 'See more',
facet_display_less: 'See less',
...
init: function() { ...}
beforeRequest: function() { ... }
afterRequest: function() { ... }
});
Site search – pivot facet
Pivot faceting allows you to facet within the results of the parent facet
- http://wiki.apache.org/solr/SimpleFacetParameters
Slight problem; we don't run Solr 4.x!
Problem
Menu facets shouldn't affect each other, but affect search result and other facets
Our solution
Solr document 1
<str name="ss_menu_1">orgmenu</str>
<str name="ss_menu_2">org</str>
Solr document 2
<str name="ss_menu_1">edumenu</str>
<str name="ss_menu_2">higher_ed</str>
Solr document 3
<str name="ss_menu_1">edumenu</str>
<str name="ss_menu_2">secondary</str>
Solr query when a top level menu tab is selected
fq={!tag=ss_menu_1}ss_menu_1:edumenu&
facet.field={!ex=ss_menu_1}ss_menu_1
Solr query when a sub-level menu tab is selected
fq={!tag=ss_menu_1}ss_menu_1:edumenu&
fq={!tag=ss_menu_1,ss_menu_2}ss_menu_2:higher_ed&
facet.field={!ex=ss_menu_1}ss_menu_1&
facet.field={!ex=ss_menu_2}ss_menu_2
Drawbacks
- Can be VERY slow on large indexes with many unique terms in the facet
Why do we do it?
- Small index; 18M terms, 12K documents
- Pivot facet fields have very few distinct values (5-8)!
Site search - autocomplete
Our goal
Give our users the feeling that we've implemented a mind-reader
How?
With relevant, grouped suggestions* as they type in a search query
Do we succeed?
50% of our «clicks to content» from searches comes from autocomplete
Implementing autocomplete is «easy»
1) Ajax
2) Detect keystrokes
3) Send one request per keystroke
4) Receive results, populate result list
Techniques we employ
- Minimal payload (reduced fl)
- But same boosts and qf as «normal» queries
- group=true, group.field=, group.limit=
- start_label^1.5 wild_label^1 wild_other^0.25
- Caching (jsonp, cache=true)
Define field type
<fieldType name="startsWith" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all"/>
</analyzer>
</fieldType>
Define fields
<field name="start_label" type="startsWith" indexed="true" stored="false" multiValued="false"/>
Copy fields
<copyField source="label" dest="start_label"/>
Define field type
<fieldType name="wildCardType" class="solr.TextField" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="70" side="front"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="70" side="back"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt">
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" ignoreCase="false"/>
<filter class="solr.NorwegianLightStemFilterFactory"/>
</analyzer>
</fieldType>
Define fields
<field name="wild_label" type="wildCardType" indexed="true" stored="false" multiValued="false"/>
<field name="wild_other" type="wildCardType" indexed="true" stored="false" multiValued="true"/>
Copy fields
<copyField source="label" dest="wild_label"/>
<copyField source="teaser" dest="wild_other"/>
<copyField source="body" dest="wild_other"/>
<copyField source="searchwords" dest="wild_other"/>
<copyField source="related_nodes" dest="wild_other"/>
Embedded search
Our goal
Let other sites search our data
How?
The exact same way we do ourselves
Do we succeed?
Two external sites are up and running and a third is on its way
Logger
~200 lines
JS library
~1700 lines
Solr 3.6
ACME Website
Solr proxy
~85 lines
ajax-solr
evolvingweb
ACME config
SolrPhpClient
r60
Default config
Config
(override)
JS library
(copy)Search
ACME Engineering
Lorum sollicitudin nunc id nibh
blandit pellentesque ipsum.
ACME Law
Cras nunc id nibh blandit
pellentesque sollicitudin.
ACME Med
Ipsum ollicitudin nunc id blandit
nibh pellentesque nibh.
- Register with us
- Include our JS library
- Set up config
- Set up HTML
- Search! (and log)
<html>
<head>
<title>ACME Website</title>
<!-- utdanning.no search framework -->
<script src="/js/jquery.js"></script>
<script src="http://example.com/solrservice/js-min/solr-search-full-min.js"></script>
<script src="/js/search-init.js"></script>
</head>
<body>
<!-- Search form -->
<form>
<input id="query" name="query" type="search" />
<input type="submit" value="Search" />
</form>
<!-- Search results -->
<div><ul class="hits" id="hits"></ul></div>
</body>
</html>
<script type="text/javascript">
// ACME mockup init-script
var Manager; // Search manager object
uno_config = loadConfig(http://example.com/solrservice/.../acme.config);
// Fully customizable search configuration, e.g.:
uno_config['server']['qf'] = 'label^1.8 content^1.2';
// Search box widget
Manager.addPlainSearch(uno_config);
// Result list widget
Manager.addResults(uno_config);
Manager.finalizeConfig(uno_config);
Manager.doRequest(); // Optional
Site owners have full control
Add, edit and configure widgets
Query fields, boosts, etc.
Faceting
Styling
Pre-limit search to parts of our index
Because we eat our own dog food!
Feeds engine
Our goal
Deliver data in bulk to partner organizations
How?
Restful searchable data endpoint that returns XML (Atom++)
Do we succeed?
Beta-partner up and running with stunning performance
Consumer
Query
Default config
Feeds engine
~300 lines
Solr proxy
~85 lines
Solr 3.6
Logger
~200 lines
SolrPhpClient
r60
Feeds engine
- Parses incoming query
- Loads config (filters, weights, ...)
- Transforms incoming + config to Solr URL
- Sends to Solr proxy
Solr Proxy
- Loads Solr PHP Client library
- Sends search request and parses response
- Returns results to Feeds engine
Feeds engine
- Loads logger and logs results
- Picks out ATOM from response
- Glues result inside an ATOM frame
- Display feed
http://example.com/data/atom/organizations
http://example.com/data/atom/organizations/10/2
http://example.com/data/atom/organizations?fq=type:HE
http://example.com/data/atom/organizations?fq=type:HE&q=law
Consume with feeds reader
Logging
How?
Logging back-end written in PHP that writes to a MySQL database
- called asynchronously from JS library
- called inline in Feeds engine
Google Analytics (ga.js)
- called from JS library (searchwords and categories)
What?
- Search terms
- Facets
- User interaction
- List of search results
- Stack latency (JS, PHP, Solr)
- Search domain
- Session
Why?
Most popular queries with no results?
Most popular queries?
How does QPS affect latency?
Follow a user through search (interaction design & user testing)
Displaying logs
Charts are generated with Google Chart Tools in Drupal
Other statistics can easily be explored with Drupal Views
Demo (includes responsiveness)
http://utdanning.no/sok
http://utdanning.no/search
http://utdanning.no/solrservice/utdanning.no
Drupal 7
Apache Solr Search Integration
+ custom indexing
Omega theme (responsiveness with Drupal)
+ custom js
Ajax Solr
+ custom widgets
Solr Php Client r60
+ custom proxy
Bootstrap (responsiveness without Drupal)
jQuery
Google Chart Tools
Remi MikalsenRemi Mikalsen
remi.mikalsen@iktsenteret.noremi.mikalsen@iktsenteret.no
iktsenteret.noiktsenteret.no
Multi-facetedMulti-faceted
responsive search,responsive search,
autocomplete,autocomplete,
feeds engine andfeeds engine and
logginglogging
CONTACT
Remi Mikalsen
remi.mikalsen@iktsenteret.no

Más contenido relacionado

La actualidad más candente

Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneSwapnil & Patil
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCampGokulD
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Adrien Grand
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Introduction to apache lucene
Introduction to apache luceneIntroduction to apache lucene
Introduction to apache luceneShrikrishna Parab
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache SolrBiogeeks
 
Munching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingMunching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingabial
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engineth0masr
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginsearchbox-com
 

La actualidad más candente (20)

Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using lucene
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCamp
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Lucene and MySQL
Lucene and MySQLLucene and MySQL
Lucene and MySQL
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 
Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015
 
Solr Architecture
Solr ArchitectureSolr Architecture
Solr Architecture
 
Azure search
Azure searchAzure search
Azure search
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Introduction to apache lucene
Introduction to apache luceneIntroduction to apache lucene
Introduction to apache lucene
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Munching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processingMunching & crunching - Lucene index post-processing
Munching & crunching - Lucene index post-processing
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 

Similar a Multi faceted responsive search, autocomplete, feeds engine & logging

Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overviewAmit Juneja
 
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Sease
 
Angular js quickstart
Angular js quickstartAngular js quickstart
Angular js quickstartLinkMe Srl
 
ACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
ACM Bay Area Data Mining Workshop: Pattern, PMML, HadoopACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
ACM Bay Area Data Mining Workshop: Pattern, PMML, HadoopPaco Nathan
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-stepsMatteo Moci
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and JavaBuilding a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Javaantoinegirbal
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享Chengjen Lee
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solrTrey Grainger
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrlucenerevolution
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0Russell Jurney
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0Russell Jurney
 
SplunkLive! Analytics with Splunk Enterprise - Part 2
SplunkLive! Analytics with Splunk Enterprise - Part 2SplunkLive! Analytics with Splunk Enterprise - Part 2
SplunkLive! Analytics with Splunk Enterprise - Part 2Splunk
 
Search Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search APISearch Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search APIWillThompson78
 
GDI Seattle - Intro to JavaScript Class 4
GDI Seattle - Intro to JavaScript Class 4GDI Seattle - Intro to JavaScript Class 4
GDI Seattle - Intro to JavaScript Class 4Heather Rock
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataAbhishek M Shivalingaiah
 
Supercharging your Organic CTR
Supercharging your Organic CTRSupercharging your Organic CTR
Supercharging your Organic CTRPhil Pearce
 

Similar a Multi faceted responsive search, autocomplete, feeds engine & logging (20)

Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
 
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
 
Broadleaf Presents Thymeleaf
Broadleaf Presents ThymeleafBroadleaf Presents Thymeleaf
Broadleaf Presents Thymeleaf
 
Angular js quickstart
Angular js quickstartAngular js quickstart
Angular js quickstart
 
ACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
ACM Bay Area Data Mining Workshop: Pattern, PMML, HadoopACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
ACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Lab manual asp.net
Lab manual asp.netLab manual asp.net
Lab manual asp.net
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
Building a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and JavaBuilding a Scalable Inbox System with MongoDB and Java
Building a Scalable Inbox System with MongoDB and Java
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
SplunkLive! Analytics with Splunk Enterprise - Part 2
SplunkLive! Analytics with Splunk Enterprise - Part 2SplunkLive! Analytics with Splunk Enterprise - Part 2
SplunkLive! Analytics with Splunk Enterprise - Part 2
 
Search Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search APISearch Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search API
 
GDI Seattle - Intro to JavaScript Class 4
GDI Seattle - Intro to JavaScript Class 4GDI Seattle - Intro to JavaScript Class 4
GDI Seattle - Intro to JavaScript Class 4
 
Built in filters
Built in filtersBuilt in filters
Built in filters
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big Data
 
Supercharging your Organic CTR
Supercharging your Organic CTRSupercharging your Organic CTR
Supercharging your Organic CTR
 

Más de lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...lucenerevolution
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platformlucenerevolution
 
Query Latency Optimization with Lucene
Query Latency Optimization with LuceneQuery Latency Optimization with Lucene
Query Latency Optimization with Lucenelucenerevolution
 

Más de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
 
Query Latency Optimization with Lucene
Query Latency Optimization with LuceneQuery Latency Optimization with Lucene
Query Latency Optimization with Lucene
 

Último

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 

Último (20)

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 

Multi faceted responsive search, autocomplete, feeds engine & logging

  • 1. Multi-faceted responsive search, autocomplete, feeds engine and logging Remi Mikalsen Search Engineer, utdanning.no
  • 3. Introduction Remi Mikalsen Search engineer, utdanning.no «Utdanning.no is the official Norwegian national education and career portal, and includes an overview of education in Norway and more than 500 career descriptions» - utdanning.no « [...] Our main goals are to improve the quality of education and to improve learning outcomes and learning for children, pupils and students thourgh use of ICT in education» - iktsenteret.no
  • 4. utdanning.no Drupal 7 & Solr 3.6 ~3 million visitors / year ~12,000 documents ~18,000,000 terms ~260 fields ~1 QPS (~9M searches / year) ~8 ms latency
  • 6. Universities, colleges and community colleges ~30 different endpoints ~3500 documents Folk high schools (non-academic) 1 national endpoint ~650 documents Secondary schools 1 national endpoint ~1100 documents Higher education admissions (Samordna opptak) 1 national endpoint ~1500 documents Secondary schools metadata (Grep) 1 national endpoint ~650 documents Higher education metadata (NUS) 1 national endpoint ~3500 documents Transform & normalize Drupal 7 ER-model Added value Editorial staff Professions, interviews, education summaries, etc. ~1500 documents Professions metadata (STYRK) 2 national endpoints ~1000 documents Fetch data Solr 3.6 De-normalized Searchable
  • 8. Drupal 7 Apache Solr Search Integration 7.x-1.1 Customized business logic Solr 3.6 Pros Basic Drupal integration Track document changes Some facet support Easily extendable Cons Lacks deep introspecting Little de-normalization Hacky hierarchies (Drupal) Note Custom config files! schema.xml (mainly dynamic fields) solrconfig.xml (mainly a drupal request handler) We added Deep introspecting Data de-normalization Solid hierarchy support Pivot facet support Atomization Manual partial re-index schema.xml - field types (auto) - various copy fields - better spell - bucket fields - autocomplete
  • 9. Organization (school) Study programStudy program Study program Organization (school) + all its Study programs Drupal DB Solr documents Study program + Organization
  • 10. <doc> <str name="id">394353</str> <bool name="bs_mainsearch">true</bool> <str name="bundle">org</str> <str name="bundle_name">Organization</str> <str name="label">ACME University</str> <str name="atom">[XML]</str> <arr name="related_nodes"> <str>ACME Rocket Science</str> <str>Study program 2</str> <str>Study program N</str> </arr> <arr name="sm_geography_hierarchy"> <str>1>California</str> <str>2>California>San Diego</str> <str>3>California>San Diego>Gaslamp Quarter</str> </arr> <str name="ss_menu_1">orgmenu</str> <str name="ss_menu_2">org</str> </doc>
  • 11. <doc> <str name="id">394354</str> <bool name="bs_mainsearch">true</bool> <str name="bundle">he</str> <str name="bundle_name">Higher Education</str> <str name="label">ACME Rocket Science</str> <str name="atom">[XML]</str> <arr name="sm_offered_by"> <str>ACME University</str> </arr> <arr name="sm_study_area"> <str>Engineering</str> <str>Science</str> </arr> <long name="its_field_semesters">8</long> <str name="ss_menu_1">edumenu</str> <str name="ss_menu_2">he</str> </doc>
  • 12. Searching - Site search - Embedded search - Feeds engine
  • 14. Our goal Students, councelors and teachers must find what they look for How? - Interaction design (IxD) vs graphical design - User testing, user testing and user testing (and experience) - Resulting in a GUI specification we must implement
  • 15. Ajax-Solr is our JS framework: https://github.com/evolvingweb/ajax-solr/wiki/reuters-tutorial - manages all querying - widgets for interaction with and displaying results - events fire search requests which updates widgets We extended it heavily - Developed all our widgets (10+) - Added logging (async, via ajax, local and GA) - Distributed configuration (server + client) - Simplified initialization script But it also works out of the box!
  • 16. Logger ~200 lines JS library ~1700 lines Solr 3.6 Our Website Solr proxy ~85 lines ajax-solr evolvingweb SolrPhpClient r60 Default config Initialize (config) JS library (copy)Search ACME Engineering Lorum sollicitudin nunc id nibh blandit pellentesque ipsum. ACME Law Cras nunc id nibh blandit pellentesque sollicitudin. ACME Med Ipsum ollicitudin nunc id blandit nibh pellentesque nibh. - Include JS library - Initialize - Set up HTML - Search! (and log)
  • 17. Site search – widgets & faceting Ajax Solr allows defining N widgets «Everything» is a widget A facet is an instance of a FacetWidget Interaction with widgets may fire query All facetation is piped into one query All widgets are updated after Solr response
  • 18. Some facet widgets we have developed - Plain Facet values and facet counts in a list Multiple (AND) or single choice - Hierarchical Facet values and facet counts in a list Clicking on a facet value drills down into the hierarchy; facet.prefix + fq - Dropdown Displays facet values in a dropdown list Useful for mobile devices in our responsive theme - Tagcloud Facet values in a tagcloud - Pivot facet Our menu system
  • 19. Adding facets Config facets['interests'] = new facetobject('tagcloud', 'field_interests', '#interests'); facets['ispublic'] = new facetobject('plain', 'field_ispublic', '#ispublic'); config['facets'] = facets; HTML <ul id="interests"></ul> <ul id="ispublic"></ul> INITIALIZE Manager.addFacets(config);
  • 20. Example widget code AjaxSolr.PlainFacetWidget = AjaxSolr.AbstractFacetWidget.extend({ multivalue: true, target: null, // HTML target id field: null, // Solr-field facet_display_limit: 5, // Max facets to display before «See more» facet_field_sort: null, // Optional facet sort dependencies: null, // Conditional display of facet facet_display_more: 'See more', facet_display_less: 'See less', ... init: function() { ...} beforeRequest: function() { ... } afterRequest: function() { ... } });
  • 21.
  • 22. Site search – pivot facet
  • 23. Pivot faceting allows you to facet within the results of the parent facet - http://wiki.apache.org/solr/SimpleFacetParameters Slight problem; we don't run Solr 4.x!
  • 24. Problem Menu facets shouldn't affect each other, but affect search result and other facets
  • 25. Our solution Solr document 1 <str name="ss_menu_1">orgmenu</str> <str name="ss_menu_2">org</str> Solr document 2 <str name="ss_menu_1">edumenu</str> <str name="ss_menu_2">higher_ed</str> Solr document 3 <str name="ss_menu_1">edumenu</str> <str name="ss_menu_2">secondary</str> Solr query when a top level menu tab is selected fq={!tag=ss_menu_1}ss_menu_1:edumenu& facet.field={!ex=ss_menu_1}ss_menu_1 Solr query when a sub-level menu tab is selected fq={!tag=ss_menu_1}ss_menu_1:edumenu& fq={!tag=ss_menu_1,ss_menu_2}ss_menu_2:higher_ed& facet.field={!ex=ss_menu_1}ss_menu_1& facet.field={!ex=ss_menu_2}ss_menu_2
  • 26. Drawbacks - Can be VERY slow on large indexes with many unique terms in the facet Why do we do it? - Small index; 18M terms, 12K documents - Pivot facet fields have very few distinct values (5-8)!
  • 27.
  • 28. Site search - autocomplete
  • 29. Our goal Give our users the feeling that we've implemented a mind-reader How? With relevant, grouped suggestions* as they type in a search query Do we succeed? 50% of our «clicks to content» from searches comes from autocomplete
  • 30. Implementing autocomplete is «easy» 1) Ajax 2) Detect keystrokes 3) Send one request per keystroke 4) Receive results, populate result list Techniques we employ - Minimal payload (reduced fl) - But same boosts and qf as «normal» queries - group=true, group.field=, group.limit= - start_label^1.5 wild_label^1 wild_other^0.25 - Caching (jsonp, cache=true)
  • 31. Define field type <fieldType name="startsWith" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all"/> </analyzer> </fieldType> Define fields <field name="start_label" type="startsWith" indexed="true" stored="false" multiValued="false"/> Copy fields <copyField source="label" dest="start_label"/>
  • 32. Define field type <fieldType name="wildCardType" class="solr.TextField" omitNorms="true"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="70" side="front"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="70" side="back"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" ignoreCase="false"/> <filter class="solr.NorwegianLightStemFilterFactory"/> </analyzer> </fieldType> Define fields <field name="wild_label" type="wildCardType" indexed="true" stored="false" multiValued="false"/> <field name="wild_other" type="wildCardType" indexed="true" stored="false" multiValued="true"/> Copy fields <copyField source="label" dest="wild_label"/> <copyField source="teaser" dest="wild_other"/> <copyField source="body" dest="wild_other"/> <copyField source="searchwords" dest="wild_other"/> <copyField source="related_nodes" dest="wild_other"/>
  • 33.
  • 35. Our goal Let other sites search our data How? The exact same way we do ourselves Do we succeed? Two external sites are up and running and a third is on its way
  • 36. Logger ~200 lines JS library ~1700 lines Solr 3.6 ACME Website Solr proxy ~85 lines ajax-solr evolvingweb ACME config SolrPhpClient r60 Default config Config (override) JS library (copy)Search ACME Engineering Lorum sollicitudin nunc id nibh blandit pellentesque ipsum. ACME Law Cras nunc id nibh blandit pellentesque sollicitudin. ACME Med Ipsum ollicitudin nunc id blandit nibh pellentesque nibh. - Register with us - Include our JS library - Set up config - Set up HTML - Search! (and log)
  • 37. <html> <head> <title>ACME Website</title> <!-- utdanning.no search framework --> <script src="/js/jquery.js"></script> <script src="http://example.com/solrservice/js-min/solr-search-full-min.js"></script> <script src="/js/search-init.js"></script> </head> <body> <!-- Search form --> <form> <input id="query" name="query" type="search" /> <input type="submit" value="Search" /> </form> <!-- Search results --> <div><ul class="hits" id="hits"></ul></div> </body> </html>
  • 38. <script type="text/javascript"> // ACME mockup init-script var Manager; // Search manager object uno_config = loadConfig(http://example.com/solrservice/.../acme.config); // Fully customizable search configuration, e.g.: uno_config['server']['qf'] = 'label^1.8 content^1.2'; // Search box widget Manager.addPlainSearch(uno_config); // Result list widget Manager.addResults(uno_config); Manager.finalizeConfig(uno_config); Manager.doRequest(); // Optional
  • 39. Site owners have full control Add, edit and configure widgets Query fields, boosts, etc. Faceting Styling Pre-limit search to parts of our index Because we eat our own dog food!
  • 41. Our goal Deliver data in bulk to partner organizations How? Restful searchable data endpoint that returns XML (Atom++) Do we succeed? Beta-partner up and running with stunning performance
  • 42. Consumer Query Default config Feeds engine ~300 lines Solr proxy ~85 lines Solr 3.6 Logger ~200 lines SolrPhpClient r60
  • 43. Feeds engine - Parses incoming query - Loads config (filters, weights, ...) - Transforms incoming + config to Solr URL - Sends to Solr proxy Solr Proxy - Loads Solr PHP Client library - Sends search request and parses response - Returns results to Feeds engine Feeds engine - Loads logger and logs results - Picks out ATOM from response - Glues result inside an ATOM frame - Display feed
  • 46. How? Logging back-end written in PHP that writes to a MySQL database - called asynchronously from JS library - called inline in Feeds engine Google Analytics (ga.js) - called from JS library (searchwords and categories) What? - Search terms - Facets - User interaction - List of search results - Stack latency (JS, PHP, Solr) - Search domain - Session
  • 47. Why? Most popular queries with no results? Most popular queries? How does QPS affect latency? Follow a user through search (interaction design & user testing) Displaying logs Charts are generated with Google Chart Tools in Drupal Other statistics can easily be explored with Drupal Views
  • 48.
  • 49.
  • 50.
  • 53. Drupal 7 Apache Solr Search Integration + custom indexing Omega theme (responsiveness with Drupal) + custom js Ajax Solr + custom widgets Solr Php Client r60 + custom proxy Bootstrap (responsiveness without Drupal) jQuery Google Chart Tools
  • 54. Remi MikalsenRemi Mikalsen remi.mikalsen@iktsenteret.noremi.mikalsen@iktsenteret.no iktsenteret.noiktsenteret.no Multi-facetedMulti-faceted responsive search,responsive search, autocomplete,autocomplete, feeds engine andfeeds engine and logginglogging