SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
Search Engine Back End API
Solution for Fast Prototyping
https://github.com/jimmylai/
search_engine_backend_template

Introduction and Tutorial
Jimmy Lai
r97922028 [at] ntu.edu.tw
http://tw.linkedin.com/pub/jimmy-lai/27/4a/536
2013/01/23
Requirements:
•  A Backend API provides:
1. 
2. 
3. 
4. 
5. 

Full-text search
Spatial search
RESTful API
Input validation
Output rendering

•  Simple and easy for fast prototyping

LDSP

2
LDSP:
Linux + Django + Solr + Python
•  Django: API serving
–  Input validation
–  Data mash-up
–  Output rendering

•  Solr: Indexing and searching
–  Schema
–  Indexing
–  Query
LDSP

3
Automation:
Shell command + Fabric
•  Installation:
–  Solr
–  Python packages: django, djangorestframework,
pysolr

•  Start services:
–  Solr
–  Django standalone server

•  Data:
–  Create Solr core
–  Feed data
LDSP

4
[Tutorial] Requirement
•  Example Dataset: Taiwan movie dataset
http://data.gov.tw/node/7731

•  Data source
http://nrchbms.culture.tw/OpenData/API/iCultureAPI.aspx?
type=26&radius=100&format=json

•  We’d like to build a geo search API to
find nearby movie items given a
location

LDSP

5
[Tutorial] Data preprocessing (1/2)
•  Original data
•  {"sno":"29277","mainTitle":"「新南光」電影
《孫悟空遊台灣》,描寫孫悟空化為⼤大學⽣生的
外型遊台灣","year":"未
知","format":"508x640pixels","subject":"戲劇
電影歌仔戲","date":"創作⽇日期:⺠民國時期⺠民國
時期","sourcesite":"http://nrch.cca.gov.tw/
ccahome/photo/photo_meta.jsp?
xml_id=0005780574&dofile=cca100009-hptmtp0281_00-0001w.jpg","px":"120.49","py":"23.853"}
LDSP

6
[Tutorial] Data preprocessing (2/2)
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 

Preprocessed data: normalized the location format
The preprocessed data is stored as data/movie.json
{
"format": "508x640pixels",
"sno": "29277",
"sourcesite": "http://nrch.cca.gov.tw/ccahome/photo/photo_meta.jsp?
xml_id=0005780574&dofile=cca100009-hp-tmtp0281_00-0001-w.jpg",
"mainTitle": "「新南光」電影《孫悟空遊台灣》,描寫孫悟空化為⼤大學⽣生的
外型遊台灣",
"location": "23.853,120.49",
"year": "未知",
"date": "創作⽇日期:⺠民國時期⺠民國時期",
"id": "29277",
"subject": "戲劇電影歌仔戲"
},

LDSP

7
[Tutorial] Data indexing
• 
• 
• 
• 
• 

pip install fabric
fab setup_env
fab create_core:movie
fab run_solr
fab feed_data:movie,data/movie.json
–  Core name as the 1st parameter, data file name
as the 2nd parameter

•  fab run_django
•  We are done!
LDSP

8
[Tutorial] Solr Admin

LDSP

9
[Tutorial] Djangorestframework UI

LDSP

10
[Tutorial] BE API Query, json output

LDSP

11
[Tutorial] BE API Query, xml output

LDSP

12
The next steps
•  Understand the mechanism of:
–  Solr:
•  Update schema
•  Update partial data
•  Query language of Solr

–  Djangorestframework:
•  Add more parameters
•  Input validation
•  Output rendering

•  Deploy to production environment
LDSP

13

Más contenido relacionado

La actualidad más candente

Python WSGI introduction
Python WSGI introductionPython WSGI introduction
Python WSGI introductionAgeeleshwar K
 
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Big Data Spain
 
Google App Engine With Java And Groovy
Google App Engine With Java And GroovyGoogle App Engine With Java And Groovy
Google App Engine With Java And GroovyKen Kousen
 
Kube-AWS
Kube-AWSKube-AWS
Kube-AWSCoreOS
 
Infrastructure as Code - Terraform - Devfest 2018
Infrastructure as Code - Terraform - Devfest 2018Infrastructure as Code - Terraform - Devfest 2018
Infrastructure as Code - Terraform - Devfest 2018Mathieu Herbert
 
CoreOS in a Nutshell
CoreOS in a NutshellCoreOS in a Nutshell
CoreOS in a NutshellCoreOS
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderSadayuki Furuhashi
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSematext Group, Inc.
 
Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabricandymccurdy
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersSematext Group, Inc.
 
Transforming Infrastructure into Code - Importing existing cloud resources u...
Transforming Infrastructure into Code  - Importing existing cloud resources u...Transforming Infrastructure into Code  - Importing existing cloud resources u...
Transforming Infrastructure into Code - Importing existing cloud resources u...Shih Oon Liong
 
Data integration with embulk
Data integration with embulkData integration with embulk
Data integration with embulkTeguh Nugraha
 
Node collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsNode collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsm_richardson
 
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...Yevgeniy Brikman
 
MySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKMySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKYoungHeon (Roy) Kim
 
Altitude NY 2018: Programming the edge workshop
Altitude NY 2018: Programming the edge workshopAltitude NY 2018: Programming the edge workshop
Altitude NY 2018: Programming the edge workshopFastly
 
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverAltitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverFastly
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종NAVER D2
 

La actualidad más candente (20)

Python WSGI introduction
Python WSGI introductionPython WSGI introduction
Python WSGI introduction
 
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...
 
Google App Engine With Java And Groovy
Google App Engine With Java And GroovyGoogle App Engine With Java And Groovy
Google App Engine With Java And Groovy
 
Kube-AWS
Kube-AWSKube-AWS
Kube-AWS
 
Infrastructure as Code - Terraform - Devfest 2018
Infrastructure as Code - Terraform - Devfest 2018Infrastructure as Code - Terraform - Devfest 2018
Infrastructure as Code - Terraform - Devfest 2018
 
CoreOS in a Nutshell
CoreOS in a NutshellCoreOS in a Nutshell
CoreOS in a Nutshell
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loader
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
 
Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabric
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Transforming Infrastructure into Code - Importing existing cloud resources u...
Transforming Infrastructure into Code  - Importing existing cloud resources u...Transforming Infrastructure into Code  - Importing existing cloud resources u...
Transforming Infrastructure into Code - Importing existing cloud resources u...
 
Data integration with embulk
Data integration with embulkData integration with embulk
Data integration with embulk
 
Node collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsNode collaboration - sharing information between your systems
Node collaboration - sharing information between your systems
 
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
 
MySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKMySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELK
 
Altitude NY 2018: Programming the edge workshop
Altitude NY 2018: Programming the edge workshopAltitude NY 2018: Programming the edge workshop
Altitude NY 2018: Programming the edge workshop
 
Embuk internals
Embuk internalsEmbuk internals
Embuk internals
 
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverAltitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
 
Scripting Embulk Plugins
Scripting Embulk PluginsScripting Embulk Plugins
Scripting Embulk Plugins
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
 

Destacado

Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Jimmy Lai
 
Fast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython NotebookFast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython NotebookJimmy Lai
 
[LDSP] Solr Usage
[LDSP] Solr Usage[LDSP] Solr Usage
[LDSP] Solr UsageJimmy Lai
 
Data Analyst Nanodegree
Data Analyst NanodegreeData Analyst Nanodegree
Data Analyst NanodegreeJimmy Lai
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012Jimmy Lai
 
Documentation with sphinx @ PyHug
Documentation with sphinx @ PyHugDocumentation with sphinx @ PyHug
Documentation with sphinx @ PyHugJimmy Lai
 
Software development practices in python
Software development practices in pythonSoftware development practices in python
Software development practices in pythonJimmy Lai
 
When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012Jimmy Lai
 
Apache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesApache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesJimmy Lai
 
Build a Searchable Knowledge Base
Build a Searchable Knowledge BaseBuild a Searchable Knowledge Base
Build a Searchable Knowledge BaseJimmy Lai
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Jimmy Lai
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHugJimmy Lai
 
NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugNetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugJimmy Lai
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learnJimmy Lai
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Jimmy Lai
 

Destacado (15)

Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
 
Fast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython NotebookFast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython Notebook
 
[LDSP] Solr Usage
[LDSP] Solr Usage[LDSP] Solr Usage
[LDSP] Solr Usage
 
Data Analyst Nanodegree
Data Analyst NanodegreeData Analyst Nanodegree
Data Analyst Nanodegree
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
 
Documentation with sphinx @ PyHug
Documentation with sphinx @ PyHugDocumentation with sphinx @ PyHug
Documentation with sphinx @ PyHug
 
Software development practices in python
Software development practices in pythonSoftware development practices in python
Software development practices in python
 
When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012
 
Apache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesApache thrift-RPC service cross languages
Apache thrift-RPC service cross languages
 
Build a Searchable Knowledge Base
Build a Searchable Knowledge BaseBuild a Searchable Knowledge Base
Build a Searchable Knowledge Base
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
 
Nltk natural language toolkit overview and application @ PyHug
Nltk  natural language toolkit overview and application @ PyHugNltk  natural language toolkit overview and application @ PyHug
Nltk natural language toolkit overview and application @ PyHug
 
NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugNetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHug
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 

Similar a [LDSP] Search Engine Back End API Solution for Fast Prototyping

rest3d Web3D 2014
rest3d Web3D 2014rest3d Web3D 2014
rest3d Web3D 2014Remi Arnaud
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1Jungsu Heo
 
Developing Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonDeveloping Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonSmartBear
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comJungsu Heo
 
Barcamp Bangkhen :: Robot Framework
Barcamp Bangkhen :: Robot FrameworkBarcamp Bangkhen :: Robot Framework
Barcamp Bangkhen :: Robot FrameworkSomkiat Puisungnoen
 
Rapid application development with spring roo j-fall 2010 - baris dere
Rapid application development with spring roo   j-fall 2010 - baris dereRapid application development with spring roo   j-fall 2010 - baris dere
Rapid application development with spring roo j-fall 2010 - baris dereBaris Dere
 
マイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPCマイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPCdisc99_
 
release_python_day3_slides_201606.pdf
release_python_day3_slides_201606.pdfrelease_python_day3_slides_201606.pdf
release_python_day3_slides_201606.pdfPaul Yang
 
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk""Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"Rinaldi Rampen
 
Apache Solr - search for everyone!
Apache Solr - search for everyone!Apache Solr - search for everyone!
Apache Solr - search for everyone!Jaran Flaath
 
Apex on Local - Better Alternative to Salesforce DX
Apex on Local - Better Alternative to Salesforce DXApex on Local - Better Alternative to Salesforce DX
Apex on Local - Better Alternative to Salesforce DXtzm_freedom
 
Inside Of Mbga Open Platform
Inside Of Mbga Open PlatformInside Of Mbga Open Platform
Inside Of Mbga Open PlatformHideo Kimura
 
Boost on!!next generation big data platform
Boost on!!next generation big data platformBoost on!!next generation big data platform
Boost on!!next generation big data platformLINE Corporation
 
Icinga 2009 at OSMC
Icinga 2009 at OSMCIcinga 2009 at OSMC
Icinga 2009 at OSMCIcinga
 
Contributing to rails
Contributing to railsContributing to rails
Contributing to railsLukas Eppler
 
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...SPS Paris
 

Similar a [LDSP] Search Engine Back End API Solution for Fast Prototyping (20)

rest3d Web3D 2014
rest3d Web3D 2014rest3d Web3D 2014
rest3d Web3D 2014
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1
 
Developing Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonDeveloping Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & Python
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
 
Barcamp Bangkhen :: Robot Framework
Barcamp Bangkhen :: Robot FrameworkBarcamp Bangkhen :: Robot Framework
Barcamp Bangkhen :: Robot Framework
 
Angular2 inter3
Angular2 inter3Angular2 inter3
Angular2 inter3
 
Norikra Recent Updates
Norikra Recent UpdatesNorikra Recent Updates
Norikra Recent Updates
 
Rapid application development with spring roo j-fall 2010 - baris dere
Rapid application development with spring roo   j-fall 2010 - baris dereRapid application development with spring roo   j-fall 2010 - baris dere
Rapid application development with spring roo j-fall 2010 - baris dere
 
マイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPCマイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPC
 
release_python_day3_slides_201606.pdf
release_python_day3_slides_201606.pdfrelease_python_day3_slides_201606.pdf
release_python_day3_slides_201606.pdf
 
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk""Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
"Hunting the Bad Guys: Using OSINT, Social Media & other tools within Splunk"
 
Apache Solr - search for everyone!
Apache Solr - search for everyone!Apache Solr - search for everyone!
Apache Solr - search for everyone!
 
Apex on Local - Better Alternative to Salesforce DX
Apex on Local - Better Alternative to Salesforce DXApex on Local - Better Alternative to Salesforce DX
Apex on Local - Better Alternative to Salesforce DX
 
JRubyConf 2009
JRubyConf 2009JRubyConf 2009
JRubyConf 2009
 
Inside Of Mbga Open Platform
Inside Of Mbga Open PlatformInside Of Mbga Open Platform
Inside Of Mbga Open Platform
 
Boost on!!next generation big data platform
Boost on!!next generation big data platformBoost on!!next generation big data platform
Boost on!!next generation big data platform
 
Icinga 2009 at OSMC
Icinga 2009 at OSMCIcinga 2009 at OSMC
Icinga 2009 at OSMC
 
Planidoo & Zotonic
Planidoo & ZotonicPlanidoo & Zotonic
Planidoo & Zotonic
 
Contributing to rails
Contributing to railsContributing to rails
Contributing to rails
 
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...
D2 - Automate Custom Solutions Deployment on Office 365 and Azure - Paolo Pia...
 

Último

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Último (20)

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

[LDSP] Search Engine Back End API Solution for Fast Prototyping

  • 1. Search Engine Back End API Solution for Fast Prototyping https://github.com/jimmylai/ search_engine_backend_template Introduction and Tutorial Jimmy Lai r97922028 [at] ntu.edu.tw http://tw.linkedin.com/pub/jimmy-lai/27/4a/536 2013/01/23
  • 2. Requirements: •  A Backend API provides: 1.  2.  3.  4.  5.  Full-text search Spatial search RESTful API Input validation Output rendering •  Simple and easy for fast prototyping LDSP 2
  • 3. LDSP: Linux + Django + Solr + Python •  Django: API serving –  Input validation –  Data mash-up –  Output rendering •  Solr: Indexing and searching –  Schema –  Indexing –  Query LDSP 3
  • 4. Automation: Shell command + Fabric •  Installation: –  Solr –  Python packages: django, djangorestframework, pysolr •  Start services: –  Solr –  Django standalone server •  Data: –  Create Solr core –  Feed data LDSP 4
  • 5. [Tutorial] Requirement •  Example Dataset: Taiwan movie dataset http://data.gov.tw/node/7731 •  Data source http://nrchbms.culture.tw/OpenData/API/iCultureAPI.aspx? type=26&radius=100&format=json •  We’d like to build a geo search API to find nearby movie items given a location LDSP 5
  • 6. [Tutorial] Data preprocessing (1/2) •  Original data •  {"sno":"29277","mainTitle":"「新南光」電影 《孫悟空遊台灣》,描寫孫悟空化為⼤大學⽣生的 外型遊台灣","year":"未 知","format":"508x640pixels","subject":"戲劇 電影歌仔戲","date":"創作⽇日期:⺠民國時期⺠民國 時期","sourcesite":"http://nrch.cca.gov.tw/ ccahome/photo/photo_meta.jsp? xml_id=0005780574&dofile=cca100009-hptmtp0281_00-0001w.jpg","px":"120.49","py":"23.853"} LDSP 6
  • 7. [Tutorial] Data preprocessing (2/2) •  •  •  •  •  •  •  •  •  •  •  •  •  Preprocessed data: normalized the location format The preprocessed data is stored as data/movie.json { "format": "508x640pixels", "sno": "29277", "sourcesite": "http://nrch.cca.gov.tw/ccahome/photo/photo_meta.jsp? xml_id=0005780574&dofile=cca100009-hp-tmtp0281_00-0001-w.jpg", "mainTitle": "「新南光」電影《孫悟空遊台灣》,描寫孫悟空化為⼤大學⽣生的 外型遊台灣", "location": "23.853,120.49", "year": "未知", "date": "創作⽇日期:⺠民國時期⺠民國時期", "id": "29277", "subject": "戲劇電影歌仔戲" }, LDSP 7
  • 8. [Tutorial] Data indexing •  •  •  •  •  pip install fabric fab setup_env fab create_core:movie fab run_solr fab feed_data:movie,data/movie.json –  Core name as the 1st parameter, data file name as the 2nd parameter •  fab run_django •  We are done! LDSP 8
  • 11. [Tutorial] BE API Query, json output LDSP 11
  • 12. [Tutorial] BE API Query, xml output LDSP 12
  • 13. The next steps •  Understand the mechanism of: –  Solr: •  Update schema •  Update partial data •  Query language of Solr –  Djangorestframework: •  Add more parameters •  Input validation •  Output rendering •  Deploy to production environment LDSP 13