SlideShare una empresa de Scribd logo
1 de 9
Descargar para leer sin conexión
Search Engine Back End API
Solution for Fast Prototyping
https://github.com/jimmylai/
search_engine_backend_template

Solr
Jimmy Lai
r97922028 [at] ntu.edu.tw
http://tw.linkedin.com/pub/jimmy-lai/27/4a/536
2013/01/28
Outline - Solr
• 
• 
• 
• 
• 

Add a new core
Define the schema
Feed data and update partial data
Query and Ranking
Spatial Query

•  Prerequisite: Introduction and Tutorial
http://www.slideshare.net/jimmy_lai/search-engine-back-end-api-solution-for-fast-prototyping

LDSP

2
Add a New Core
•  Core in Solr: a set of configuration of schema
and indexing.
•  Cmd: fab create_core:$core_name
–  What happended:
1.  Copy the dir data/solr_core_template to dir
solr_conf with new name $core_name
2.  Set config in core.properties with name=
$core_name
3.  Put it under solr_conf and then run solr with
solr.solr.home=solr_conf, and then solr can find
the new core.
• 

Can be done with: fab start_solr

Ref:	
  h'p://wiki.apache.org/solr/CoreAdmin	
  
	
  
Define the Schema
•  Edit file solr_conf/$core_name/conf/
shcema.xml
•  Add more fields in <fields> section:
–  <field name="sourcesite" type="string" indexed="true" stored="true"
required="false" />

•  Define your type in <types> section:
<fieldType name=”chinese" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory”/>
</analyzer>
</fieldType>

Ref:	
  h'p://wiki.apache.org/solr/SchemaXml	
  
	
  
Feed Data and Update Partial Data
•  Run solr after schema updated:
–  fab run_solr

•  Prepare new feed data in JSON format:
–  [{‘id’: id1, ‘field1’: value11, ‘field2’: value21}, {‘id’: id2,
‘field_1’:value21}]

•  Send by Http Post to Solr:
–  curl http://localhost:8983/solr/$core_name/update/json?
commit=true --data-binary @$json_file_name -i -H 'Contenttype:application/json’

•  Partial Update JSON format:
–  [{‘id’: id1, ‘field1’: {‘set’: new_value}]
Ref:	
  h'p://wiki.apache.org/solr/UpdateJSON	
  
	
  
Query and Ranking
•  Let’s use pysolr:
–  solr = pysolr.Solr('http://localhost:8983/solr/movie’)
–  results = [i for i in solr.search('*:*', **{‘row’: 100})]

•  Query (1st parameter of search())
–  Specify value of a field ‘field:value’
–  Range query
‘field:[1 TO 100]’
–  Boolean operation ‘field1:v1 AND field2:v2’

•  Sort (put in 2nd parameter of search())
–  {‘sort’: ‘field desc’}
Ref:	
  h'p://wiki.apache.org/solr/SolrQuerySyntax	
  
	
  
Query and Ranking
Exploit the Solr UI
Spatial Query
•  In shcema.xml, set the the type of field as
location:
–  <field name="location" type="location" indexed="true" stored="true"
required="false" />

•  Query:
–  {'pt': ’23.853,120.498’, 'sfield': 'location', 'sort': 'geodist() asc'}

•  Get the distance (KM) by:
–  {‘fl’: ‘*,distance:geodist()’}

Ref:	
  h'p://wiki.apache.org/solr/SpaEalSearch	
  
	
  
The next steps
•  Understand the mechanism of:
–  Solr:
•  Update schema
•  Update partial data
•  Query language of Solr

–  Djangorestframework:
•  Add more parameters
•  Input validation
•  Output rendering

•  Deploy to production environment
LDSP

9

Más contenido relacionado

La actualidad más candente

BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...
BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...
BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...Ron Reiter
 
Managing data workflows with Luigi
Managing data workflows with LuigiManaging data workflows with Luigi
Managing data workflows with LuigiTeemu Kurppa
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Sematext Group, Inc.
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
How to automate all your SEO projects
How to automate all your SEO projectsHow to automate all your SEO projects
How to automate all your SEO projectsVincent Terrasi
 
Natural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache SparkNatural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache SparkDatabricks
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalDatabricks
 
マーケティングのためのHadoop利用
マーケティングのためのHadoop利用マーケティングのためのHadoop利用
マーケティングのためのHadoop利用Tatsuya Sasaki
 
H2O World - Intro to R, Python, and Flow - Amy Wang
H2O World - Intro to R, Python, and Flow - Amy WangH2O World - Intro to R, Python, and Flow - Amy Wang
H2O World - Intro to R, Python, and Flow - Amy WangSri Ambati
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container EraSadayuki Furuhashi
 
Dapper - Rise of the MicroORM
Dapper - Rise of the MicroORMDapper - Rise of the MicroORM
Dapper - Rise of the MicroORMSquareHire
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Lucidworks
 
Elastic search integration with hadoop leveragebigdata
Elastic search integration with hadoop   leveragebigdataElastic search integration with hadoop   leveragebigdata
Elastic search integration with hadoop leveragebigdataPooja Gupta
 
Relational Database Access with Python
Relational Database Access with PythonRelational Database Access with Python
Relational Database Access with PythonMark Rees
 
Relational Database Access with Python ‘sans’ ORM
Relational Database Access with Python ‘sans’ ORM  Relational Database Access with Python ‘sans’ ORM
Relational Database Access with Python ‘sans’ ORM Mark Rees
 
Monitoring Spark Applications
Monitoring Spark ApplicationsMonitoring Spark Applications
Monitoring Spark ApplicationsTzach Zohar
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesSadayuki Furuhashi
 
Introducing Koalas 1.0 (and 1.1)
Introducing Koalas 1.0 (and 1.1)Introducing Koalas 1.0 (and 1.1)
Introducing Koalas 1.0 (and 1.1)Takuya UESHIN
 

La actualidad más candente (20)

BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...
BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...
BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...
 
Managing data workflows with Luigi
Managing data workflows with LuigiManaging data workflows with Luigi
Managing data workflows with Luigi
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
How to automate all your SEO projects
How to automate all your SEO projectsHow to automate all your SEO projects
How to automate all your SEO projects
 
Natural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache SparkNatural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache Spark
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare Metal
 
マーケティングのためのHadoop利用
マーケティングのためのHadoop利用マーケティングのためのHadoop利用
マーケティングのためのHadoop利用
 
H2O World - Intro to R, Python, and Flow - Amy Wang
H2O World - Intro to R, Python, and Flow - Amy WangH2O World - Intro to R, Python, and Flow - Amy Wang
H2O World - Intro to R, Python, and Flow - Amy Wang
 
Python database interfaces
Python database  interfacesPython database  interfaces
Python database interfaces
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 
Dapper - Rise of the MicroORM
Dapper - Rise of the MicroORMDapper - Rise of the MicroORM
Dapper - Rise of the MicroORM
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
 
Elastic search integration with hadoop leveragebigdata
Elastic search integration with hadoop   leveragebigdataElastic search integration with hadoop   leveragebigdata
Elastic search integration with hadoop leveragebigdata
 
Relational Database Access with Python
Relational Database Access with PythonRelational Database Access with Python
Relational Database Access with Python
 
Relational Database Access with Python ‘sans’ ORM
Relational Database Access with Python ‘sans’ ORM  Relational Database Access with Python ‘sans’ ORM
Relational Database Access with Python ‘sans’ ORM
 
Monitoring Spark Applications
Monitoring Spark ApplicationsMonitoring Spark Applications
Monitoring Spark Applications
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
Introducing Koalas 1.0 (and 1.1)
Introducing Koalas 1.0 (and 1.1)Introducing Koalas 1.0 (and 1.1)
Introducing Koalas 1.0 (and 1.1)
 

Destacado

Fast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython NotebookFast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython NotebookJimmy Lai
 
Data Analyst Nanodegree
Data Analyst NanodegreeData Analyst Nanodegree
Data Analyst NanodegreeJimmy Lai
 
[LDSP] Search Engine Back End API Solution for Fast Prototyping
[LDSP] Search Engine Back End API Solution for Fast Prototyping[LDSP] Search Engine Back End API Solution for Fast Prototyping
[LDSP] Search Engine Back End API Solution for Fast PrototypingJimmy Lai
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012Jimmy Lai
 
Documentation with sphinx @ PyHug
Documentation with sphinx @ PyHugDocumentation with sphinx @ PyHug
Documentation with sphinx @ PyHugJimmy Lai
 
Software development practices in python
Software development practices in pythonSoftware development practices in python
Software development practices in pythonJimmy Lai
 
When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012Jimmy Lai
 
Apache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesApache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesJimmy Lai
 
Build a Searchable Knowledge Base
Build a Searchable Knowledge BaseBuild a Searchable Knowledge Base
Build a Searchable Knowledge BaseJimmy Lai
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Jimmy Lai
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHugJimmy Lai
 
NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugNetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugJimmy Lai
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learnJimmy Lai
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Jimmy Lai
 

Destacado (14)

Fast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython NotebookFast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython Notebook
 
Data Analyst Nanodegree
Data Analyst NanodegreeData Analyst Nanodegree
Data Analyst Nanodegree
 
[LDSP] Search Engine Back End API Solution for Fast Prototyping
[LDSP] Search Engine Back End API Solution for Fast Prototyping[LDSP] Search Engine Back End API Solution for Fast Prototyping
[LDSP] Search Engine Back End API Solution for Fast Prototyping
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
 
Documentation with sphinx @ PyHug
Documentation with sphinx @ PyHugDocumentation with sphinx @ PyHug
Documentation with sphinx @ PyHug
 
Software development practices in python
Software development practices in pythonSoftware development practices in python
Software development practices in python
 
When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012
 
Apache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesApache thrift-RPC service cross languages
Apache thrift-RPC service cross languages
 
Build a Searchable Knowledge Base
Build a Searchable Knowledge BaseBuild a Searchable Knowledge Base
Build a Searchable Knowledge Base
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHug
 
NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugNetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHug
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 

Similar a [LDSP] Solr Usage

JSON in Solr: From Top to Bottom - Alexander Rafalovitch, United Nations
JSON in Solr: From Top to Bottom - Alexander Rafalovitch, United NationsJSON in Solr: From Top to Bottom - Alexander Rafalovitch, United Nations
JSON in Solr: From Top to Bottom - Alexander Rafalovitch, United NationsLucidworks
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaCominvent AS
 
Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Michael Peacock
 
SolrJ: Power and Pitfalls - Jason Gerlowski, Lucidworks
SolrJ: Power and Pitfalls - Jason Gerlowski, LucidworksSolrJ: Power and Pitfalls - Jason Gerlowski, Lucidworks
SolrJ: Power and Pitfalls - Jason Gerlowski, LucidworksLucidworks
 
Rails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineDavid Keener
 
MySQL User Conference 2009: Python and MySQL
MySQL User Conference 2009: Python and MySQLMySQL User Conference 2009: Python and MySQL
MySQL User Conference 2009: Python and MySQLTed Leung
 
Spring Cloud: API gateway upgrade & configuration in the cloud
Spring Cloud: API gateway upgrade & configuration in the cloudSpring Cloud: API gateway upgrade & configuration in the cloud
Spring Cloud: API gateway upgrade & configuration in the cloudOrkhan Gasimov
 
Solr JDBC - Lucene/Solr Revolution 2016
Solr JDBC - Lucene/Solr Revolution 2016Solr JDBC - Lucene/Solr Revolution 2016
Solr JDBC - Lucene/Solr Revolution 2016Kevin Risden
 
Scaling search-clusters-solr-k8s-2020-amrit-sarkar
Scaling search-clusters-solr-k8s-2020-amrit-sarkarScaling search-clusters-solr-k8s-2020-amrit-sarkar
Scaling search-clusters-solr-k8s-2020-amrit-sarkarAmrit Sarkar
 
Shaping Clouds with Terraform
Shaping Clouds with TerraformShaping Clouds with Terraform
Shaping Clouds with TerraformMike Fowler
 
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon ConsultingSolr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon ConsultingLucidworks
 
RESTful API を Chalice で紐解く 〜 Python Serverless Microframework for AWS 〜
RESTful API を Chalice で紐解く 〜 Python Serverless Microframework for AWS 〜RESTful API を Chalice で紐解く 〜 Python Serverless Microframework for AWS 〜
RESTful API を Chalice で紐解く 〜 Python Serverless Microframework for AWS 〜崇之 清水
 
Solr introduction
Solr introductionSolr introduction
Solr introductionLap Tran
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" DataArt
 

Similar a [LDSP] Solr Usage (20)

JSON in Solr: from top to bottom
JSON in Solr: from top to bottomJSON in Solr: from top to bottom
JSON in Solr: from top to bottom
 
JSON in Solr: From Top to Bottom - Alexander Rafalovitch, United Nations
JSON in Solr: From Top to Bottom - Alexander Rafalovitch, United NationsJSON in Solr: From Top to Bottom - Alexander Rafalovitch, United Nations
JSON in Solr: From Top to Bottom - Alexander Rafalovitch, United Nations
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alpha
 
Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012
 
SolrJ: Power and Pitfalls - Jason Gerlowski, Lucidworks
SolrJ: Power and Pitfalls - Jason Gerlowski, LucidworksSolrJ: Power and Pitfalls - Jason Gerlowski, Lucidworks
SolrJ: Power and Pitfalls - Jason Gerlowski, Lucidworks
 
Rails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search Engine
 
MySQL User Conference 2009: Python and MySQL
MySQL User Conference 2009: Python and MySQLMySQL User Conference 2009: Python and MySQL
MySQL User Conference 2009: Python and MySQL
 
Spring Cloud: API gateway upgrade & configuration in the cloud
Spring Cloud: API gateway upgrade & configuration in the cloudSpring Cloud: API gateway upgrade & configuration in the cloud
Spring Cloud: API gateway upgrade & configuration in the cloud
 
Solr JDBC - Lucene/Solr Revolution 2016
Solr JDBC - Lucene/Solr Revolution 2016Solr JDBC - Lucene/Solr Revolution 2016
Solr JDBC - Lucene/Solr Revolution 2016
 
ReactJS for Beginners
ReactJS for BeginnersReactJS for Beginners
ReactJS for Beginners
 
Scaling search-clusters-solr-k8s-2020-amrit-sarkar
Scaling search-clusters-solr-k8s-2020-amrit-sarkarScaling search-clusters-solr-k8s-2020-amrit-sarkar
Scaling search-clusters-solr-k8s-2020-amrit-sarkar
 
Shaping Clouds with Terraform
Shaping Clouds with TerraformShaping Clouds with Terraform
Shaping Clouds with Terraform
 
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon ConsultingSolr JDBC: Presented by Kevin Risden, Avalon Consulting
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
 
RESTful API を Chalice で紐解く 〜 Python Serverless Microframework for AWS 〜
RESTful API を Chalice で紐解く 〜 Python Serverless Microframework for AWS 〜RESTful API を Chalice で紐解く 〜 Python Serverless Microframework for AWS 〜
RESTful API を Chalice で紐解く 〜 Python Serverless Microframework for AWS 〜
 
Solr introduction
Solr introductionSolr introduction
Solr introduction
 
React inter3
React inter3React inter3
React inter3
 
ApacheCon 2005
ApacheCon 2005ApacheCon 2005
ApacheCon 2005
 
Solr
SolrSolr
Solr
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 

Más de Jimmy Lai

Python Linters at Scale.pdf
Python Linters at Scale.pdfPython Linters at Scale.pdf
Python Linters at Scale.pdfJimmy Lai
 
EuroPython 2022 - Automated Refactoring Large Python Codebases
EuroPython 2022 - Automated Refactoring Large Python CodebasesEuroPython 2022 - Automated Refactoring Large Python Codebases
EuroPython 2022 - Automated Refactoring Large Python CodebasesJimmy Lai
 
Annotate types in large codebase with automated refactoring
Annotate types in large codebase with automated refactoringAnnotate types in large codebase with automated refactoring
Annotate types in large codebase with automated refactoringJimmy Lai
 
The journey of asyncio adoption in instagram
The journey of asyncio adoption in instagramThe journey of asyncio adoption in instagram
The journey of asyncio adoption in instagramJimmy Lai
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Jimmy Lai
 
Continuous Delivery: automated testing, continuous integration and continuous...
Continuous Delivery: automated testing, continuous integration and continuous...Continuous Delivery: automated testing, continuous integration and continuous...
Continuous Delivery: automated testing, continuous integration and continuous...Jimmy Lai
 

Más de Jimmy Lai (6)

Python Linters at Scale.pdf
Python Linters at Scale.pdfPython Linters at Scale.pdf
Python Linters at Scale.pdf
 
EuroPython 2022 - Automated Refactoring Large Python Codebases
EuroPython 2022 - Automated Refactoring Large Python CodebasesEuroPython 2022 - Automated Refactoring Large Python Codebases
EuroPython 2022 - Automated Refactoring Large Python Codebases
 
Annotate types in large codebase with automated refactoring
Annotate types in large codebase with automated refactoringAnnotate types in large codebase with automated refactoring
Annotate types in large codebase with automated refactoring
 
The journey of asyncio adoption in instagram
The journey of asyncio adoption in instagramThe journey of asyncio adoption in instagram
The journey of asyncio adoption in instagram
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
 
Continuous Delivery: automated testing, continuous integration and continuous...
Continuous Delivery: automated testing, continuous integration and continuous...Continuous Delivery: automated testing, continuous integration and continuous...
Continuous Delivery: automated testing, continuous integration and continuous...
 

Último

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 

Último (20)

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 

[LDSP] Solr Usage

  • 1. Search Engine Back End API Solution for Fast Prototyping https://github.com/jimmylai/ search_engine_backend_template Solr Jimmy Lai r97922028 [at] ntu.edu.tw http://tw.linkedin.com/pub/jimmy-lai/27/4a/536 2013/01/28
  • 2. Outline - Solr •  •  •  •  •  Add a new core Define the schema Feed data and update partial data Query and Ranking Spatial Query •  Prerequisite: Introduction and Tutorial http://www.slideshare.net/jimmy_lai/search-engine-back-end-api-solution-for-fast-prototyping LDSP 2
  • 3. Add a New Core •  Core in Solr: a set of configuration of schema and indexing. •  Cmd: fab create_core:$core_name –  What happended: 1.  Copy the dir data/solr_core_template to dir solr_conf with new name $core_name 2.  Set config in core.properties with name= $core_name 3.  Put it under solr_conf and then run solr with solr.solr.home=solr_conf, and then solr can find the new core. •  Can be done with: fab start_solr Ref:  h'p://wiki.apache.org/solr/CoreAdmin    
  • 4. Define the Schema •  Edit file solr_conf/$core_name/conf/ shcema.xml •  Add more fields in <fields> section: –  <field name="sourcesite" type="string" indexed="true" stored="true" required="false" /> •  Define your type in <types> section: <fieldType name=”chinese" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory”/> </analyzer> </fieldType> Ref:  h'p://wiki.apache.org/solr/SchemaXml    
  • 5. Feed Data and Update Partial Data •  Run solr after schema updated: –  fab run_solr •  Prepare new feed data in JSON format: –  [{‘id’: id1, ‘field1’: value11, ‘field2’: value21}, {‘id’: id2, ‘field_1’:value21}] •  Send by Http Post to Solr: –  curl http://localhost:8983/solr/$core_name/update/json? commit=true --data-binary @$json_file_name -i -H 'Contenttype:application/json’ •  Partial Update JSON format: –  [{‘id’: id1, ‘field1’: {‘set’: new_value}] Ref:  h'p://wiki.apache.org/solr/UpdateJSON    
  • 6. Query and Ranking •  Let’s use pysolr: –  solr = pysolr.Solr('http://localhost:8983/solr/movie’) –  results = [i for i in solr.search('*:*', **{‘row’: 100})] •  Query (1st parameter of search()) –  Specify value of a field ‘field:value’ –  Range query ‘field:[1 TO 100]’ –  Boolean operation ‘field1:v1 AND field2:v2’ •  Sort (put in 2nd parameter of search()) –  {‘sort’: ‘field desc’} Ref:  h'p://wiki.apache.org/solr/SolrQuerySyntax    
  • 8. Spatial Query •  In shcema.xml, set the the type of field as location: –  <field name="location" type="location" indexed="true" stored="true" required="false" /> •  Query: –  {'pt': ’23.853,120.498’, 'sfield': 'location', 'sort': 'geodist() asc'} •  Get the distance (KM) by: –  {‘fl’: ‘*,distance:geodist()’} Ref:  h'p://wiki.apache.org/solr/SpaEalSearch    
  • 9. The next steps •  Understand the mechanism of: –  Solr: •  Update schema •  Update partial data •  Query language of Solr –  Djangorestframework: •  Add more parameters •  Input validation •  Output rendering •  Deploy to production environment LDSP 9