Submit Search
Upload
ニコニコ動画を検索可能にしてみよう
•
32 likes
•
39,679 views
genta kaneyama
Follow
indexing 2.5billion with elasticsearch
Read less
Read more
Technology
News & Politics
Slideshow view
Report
Share
Slideshow view
Report
Share
1 of 52
Download now
Download to read offline
Recommended
Debugging and Testing ES Systems
Debugging and Testing ES Systems
Chris Birchall
Elastic search 클러스터관리
Elastic search 클러스터관리
HyeonSeok Choi
Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용
종민 김
ElasticSearch
ElasticSearch
Luiz Rocha
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
NAVER D2
[2B1]검색엔진의 패러다임 전환
[2B1]검색엔진의 패러다임 전환
NAVER D2
Gazelle - Plack Handler for performance freaks #yokohamapm
Gazelle - Plack Handler for performance freaks #yokohamapm
Masahiro Nagano
Big Master Data PHP BLT #1
Big Master Data PHP BLT #1
Masahiro Nagano
Recommended
Debugging and Testing ES Systems
Debugging and Testing ES Systems
Chris Birchall
Elastic search 클러스터관리
Elastic search 클러스터관리
HyeonSeok Choi
Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용
종민 김
ElasticSearch
ElasticSearch
Luiz Rocha
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
NAVER D2
[2B1]검색엔진의 패러다임 전환
[2B1]검색엔진의 패러다임 전환
NAVER D2
Gazelle - Plack Handler for performance freaks #yokohamapm
Gazelle - Plack Handler for performance freaks #yokohamapm
Masahiro Nagano
Big Master Data PHP BLT #1
Big Master Data PHP BLT #1
Masahiro Nagano
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
Henry Jeong
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
NAVER D2
power-assert, mechanism and philosophy
power-assert, mechanism and philosophy
Takuto Wada
Drupal 8 + Elasticsearch + Docker
Drupal 8 + Elasticsearch + Docker
Roald Umandal
Scala Frustrations
Scala Frustrations
takezoe
How to build a High Performance PSGI/Plack Server
How to build a High Performance PSGI/Plack Server
Masahiro Nagano
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and Solr
Sematext Group, Inc.
Use Cases for Elastic Search Percolator
Use Cases for Elastic Search Percolator
Maxim Shelest
What I learned from FluentConf and then some
What I learned from FluentConf and then some
Ohad Kravchick
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
async_io
Stop Worrying & Love the SQL - A Case Study
Stop Worrying & Love the SQL - A Case Study
All Things Open
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter
MongoDB's New Aggregation framework
MongoDB's New Aggregation framework
Chris Westin
Search Evolution - Von Lucene zu Solr und ElasticSearch
Search Evolution - Von Lucene zu Solr und ElasticSearch
Florian Hopf
CouchDB on Android
CouchDB on Android
Sven Haiges
What is nodejs
What is nodejs
JeongHun Byeon
What the web platform (and your app!) can learn from Node.js
What the web platform (and your app!) can learn from Node.js
wbinnssmith
Web前端性能优化 2014
Web前端性能优化 2014
Yubei Li
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
DevOps_Fest
Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)
Felix Geisendörfer
Solrを使ったレシピ検索のプロトタイピング
Solrを使ったレシピ検索のプロトタイピング
genta kaneyama
Solr at cookpad
Solr at cookpad
genta kaneyama
More Related Content
What's hot
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
Henry Jeong
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
NAVER D2
power-assert, mechanism and philosophy
power-assert, mechanism and philosophy
Takuto Wada
Drupal 8 + Elasticsearch + Docker
Drupal 8 + Elasticsearch + Docker
Roald Umandal
Scala Frustrations
Scala Frustrations
takezoe
How to build a High Performance PSGI/Plack Server
How to build a High Performance PSGI/Plack Server
Masahiro Nagano
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and Solr
Sematext Group, Inc.
Use Cases for Elastic Search Percolator
Use Cases for Elastic Search Percolator
Maxim Shelest
What I learned from FluentConf and then some
What I learned from FluentConf and then some
Ohad Kravchick
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
async_io
Stop Worrying & Love the SQL - A Case Study
Stop Worrying & Love the SQL - A Case Study
All Things Open
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter
MongoDB's New Aggregation framework
MongoDB's New Aggregation framework
Chris Westin
Search Evolution - Von Lucene zu Solr und ElasticSearch
Search Evolution - Von Lucene zu Solr und ElasticSearch
Florian Hopf
CouchDB on Android
CouchDB on Android
Sven Haiges
What is nodejs
What is nodejs
JeongHun Byeon
What the web platform (and your app!) can learn from Node.js
What the web platform (and your app!) can learn from Node.js
wbinnssmith
Web前端性能优化 2014
Web前端性能优化 2014
Yubei Li
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
DevOps_Fest
Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)
Felix Geisendörfer
What's hot
(20)
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
power-assert, mechanism and philosophy
power-assert, mechanism and philosophy
Drupal 8 + Elasticsearch + Docker
Drupal 8 + Elasticsearch + Docker
Scala Frustrations
Scala Frustrations
How to build a High Performance PSGI/Plack Server
How to build a High Performance PSGI/Plack Server
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and Solr
Use Cases for Elastic Search Percolator
Use Cases for Elastic Search Percolator
What I learned from FluentConf and then some
What I learned from FluentConf and then some
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
Stop Worrying & Love the SQL - A Case Study
Stop Worrying & Love the SQL - A Case Study
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
MongoDB's New Aggregation framework
MongoDB's New Aggregation framework
Search Evolution - Von Lucene zu Solr und ElasticSearch
Search Evolution - Von Lucene zu Solr und ElasticSearch
CouchDB on Android
CouchDB on Android
What is nodejs
What is nodejs
What the web platform (and your app!) can learn from Node.js
What the web platform (and your app!) can learn from Node.js
Web前端性能优化 2014
Web前端性能优化 2014
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
DevOps Fest 2019. Сергей Марченко. Terraform: a novel about modules, provider...
Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)
Viewers also liked
Solrを使ったレシピ検索のプロトタイピング
Solrを使ったレシピ検索のプロトタイピング
genta kaneyama
Solr at cookpad
Solr at cookpad
genta kaneyama
Social media contract for parents and teens
Social media contract for parents and teens
Jill Celeste
Elasticsearch at CrowdWorks
Elasticsearch at CrowdWorks
佑介 九岡
Elasticsearchプラグインの作り方
Elasticsearchプラグインの作り方
Shinsuke Sugaya
Solr AutoComplete and Did You Mean?
Solr AutoComplete and Did You Mean?
Minoru Osuka
AeroGear & Java EE 7 で簡単プッシュ
AeroGear & Java EE 7 で簡単プッシュ
Norito Agetsuma
elasticsearchソースコードを読みはじめてみた
elasticsearchソースコードを読みはじめてみた
furandon_pig
JSR 352 “Batch Applications for the Java Platform”
JSR 352 “Batch Applications for the Java Platform”
Norito Agetsuma
ElasticSearch勉強会 第6回
ElasticSearch勉強会 第6回
Naoyuki Yamada
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
Amazon Web Services
Jbatch実践入門 #jdt2015
Jbatch実践入門 #jdt2015
Norito Agetsuma
AWS Black Belt Tech Webinar 2016 〜 Amazon CloudSearch & Amazon Elasticsearch ...
AWS Black Belt Tech Webinar 2016 〜 Amazon CloudSearch & Amazon Elasticsearch ...
Amazon Web Services Japan
Spring frameworkが大好きなおはなし
Spring frameworkが大好きなおはなし
Satoshi Kisanuki
爆速クエリエンジン”Presto”を使いたくなる話
爆速クエリエンジン”Presto”を使いたくなる話
Kentaro Yoshida
Java Batch 仕様 (Public Review時点)
Java Batch 仕様 (Public Review時点)
Norito Agetsuma
JobStreamerではじめるJavaBatchのクラウド分散実行
JobStreamerではじめるJavaBatchのクラウド分散実行
Yoshitaka Kawashima
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
Sadayuki Furuhashi
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
Amazon Web Services
たとえ日本人同士でも必要な異文化理解力
たとえ日本人同士でも必要な異文化理解力
Yoshitaka Kawashima
Viewers also liked
(20)
Solrを使ったレシピ検索のプロトタイピング
Solrを使ったレシピ検索のプロトタイピング
Solr at cookpad
Solr at cookpad
Social media contract for parents and teens
Social media contract for parents and teens
Elasticsearch at CrowdWorks
Elasticsearch at CrowdWorks
Elasticsearchプラグインの作り方
Elasticsearchプラグインの作り方
Solr AutoComplete and Did You Mean?
Solr AutoComplete and Did You Mean?
AeroGear & Java EE 7 で簡単プッシュ
AeroGear & Java EE 7 で簡単プッシュ
elasticsearchソースコードを読みはじめてみた
elasticsearchソースコードを読みはじめてみた
JSR 352 “Batch Applications for the Java Platform”
JSR 352 “Batch Applications for the Java Platform”
ElasticSearch勉強会 第6回
ElasticSearch勉強会 第6回
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
Jbatch実践入門 #jdt2015
Jbatch実践入門 #jdt2015
AWS Black Belt Tech Webinar 2016 〜 Amazon CloudSearch & Amazon Elasticsearch ...
AWS Black Belt Tech Webinar 2016 〜 Amazon CloudSearch & Amazon Elasticsearch ...
Spring frameworkが大好きなおはなし
Spring frameworkが大好きなおはなし
爆速クエリエンジン”Presto”を使いたくなる話
爆速クエリエンジン”Presto”を使いたくなる話
Java Batch 仕様 (Public Review時点)
Java Batch 仕様 (Public Review時点)
JobStreamerではじめるJavaBatchのクラウド分散実行
JobStreamerではじめるJavaBatchのクラウド分散実行
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
たとえ日本人同士でも必要な異文化理解力
たとえ日本人同士でも必要な異文化理解力
Similar to ニコニコ動画を検索可能にしてみよう
¡El mejor lenguaje para automatizar pruebas!
¡El mejor lenguaje para automatizar pruebas!
Antonio Robres Turon
System insight without Interference
System insight without Interference
Tony Tam
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
Amazon Web Services Japan
Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020
Timothy Spann
Invoke-CradleCrafter: Moar PowerShell obFUsk8tion & Detection (@('Tech','niqu...
Invoke-CradleCrafter: Moar PowerShell obFUsk8tion & Detection (@('Tech','niqu...
Daniel Bohannon
Icinga Camp San Diego: Apify them all
Icinga Camp San Diego: Apify them all
Icinga
Icinga Camp San Diego 2016 - Apify them all
Icinga Camp San Diego 2016 - Apify them all
Icinga
K8s monitoring with elk
K8s monitoring with elk
윤종원 윤종원
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
Wayne Chen
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
Amazee Labs
e10sとアプリ間通信
e10sとアプリ間通信
Makoto Kato
Building APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
Antonio Peric-Mazar
Parse, scale to millions
Parse, scale to millions
Florent Vilmart
スマートフォンサイトの作成術 - 大川洋一
スマートフォンサイトの作成術 - 大川洋一
okyawa
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
Puppet
Angular2 inter3
Angular2 inter3
Oswald Campesato
Anwendungsfälle für Elasticsearch JAX 2015
Anwendungsfälle für Elasticsearch JAX 2015
Florian Hopf
Make BDD great again
Make BDD great again
Yana Gusti
2つの同期 4つの状態 #roppongiswift
2つの同期 4つの状態 #roppongiswift
Kenji Tanaka
Nicole Neumann - Let’s Monitor All The Things
Nicole Neumann - Let’s Monitor All The Things
Nicole Neumann
Similar to ニコニコ動画を検索可能にしてみよう
(20)
¡El mejor lenguaje para automatizar pruebas!
¡El mejor lenguaje para automatizar pruebas!
System insight without Interference
System insight without Interference
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020
Invoke-CradleCrafter: Moar PowerShell obFUsk8tion & Detection (@('Tech','niqu...
Invoke-CradleCrafter: Moar PowerShell obFUsk8tion & Detection (@('Tech','niqu...
Icinga Camp San Diego: Apify them all
Icinga Camp San Diego: Apify them all
Icinga Camp San Diego 2016 - Apify them all
Icinga Camp San Diego 2016 - Apify them all
K8s monitoring with elk
K8s monitoring with elk
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
e10sとアプリ間通信
e10sとアプリ間通信
Building APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
Parse, scale to millions
Parse, scale to millions
スマートフォンサイトの作成術 - 大川洋一
スマートフォンサイトの作成術 - 大川洋一
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
Angular2 inter3
Angular2 inter3
Anwendungsfälle für Elasticsearch JAX 2015
Anwendungsfälle für Elasticsearch JAX 2015
Make BDD great again
Make BDD great again
2つの同期 4つの状態 #roppongiswift
2つの同期 4つの状態 #roppongiswift
Nicole Neumann - Let’s Monitor All The Things
Nicole Neumann - Let’s Monitor All The Things
Recently uploaded
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Memoori
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Pooja Nehwal
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
AndikSusilo4
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Safe Software
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Delhi Call girls
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
Key Features Of Token Development (1).pptx
Key Features Of Token Development (1).pptx
LBM Solutions
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
shyamraj55
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
naman860154
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Padma Pradeep
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
OnBoard
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
Neo4j
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
carlostorres15106
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
XfilesPro
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
ThousandEyes
Recently uploaded
(20)
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Key Features Of Token Development (1).pptx
Key Features Of Token Development (1).pptx
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
ニコニコ動画を検索可能にしてみよう
1.
ニコニコ動画データ セットを検索可能に してみよう @PENGUINANA_
2.
whoami • @PENGUINANA_ /
兼山元太 • エンジニア at *.cookpad.com/* • 検索インフラとサービス開発
3.
身の回りのJSON • tweet • 140
character message
4.
身の回りのJSON • tweet • 140
character message • user_name • datetime • location • reply or not/contains link or not/ retweeted count/reply count ...
5.
身の回りのJSON • access log •
ip address • requested content • status code • response time • referrer
6.
身の回りのJSON • event log •
user_id • event name • params(hash) • datetime • user agent
7.
身の回りのJSON • dictionary edit
request • keyword • operation type • requester • status(applied or not)
8.
kibana • http://demo.kibana.org/ • http://www.elasticsearch.org/blog/kibana- whats-cooking/
9.
kibana@cookpad • log dashboard
for internal API • explore log • capacity planning • performance check • slowquery
10.
dashboard for each
application
11.
テーマ • データサイズに負けずにJSONデータを 柔軟に検索/分析できれば日常が楽にな る • どうやったらできる?難しい?
12.
やってみればよい • ニコニコ動画データセット • 検索/分析可能にする
13.
データセット • ニコニコ動画公式データセット • 800万動画のメタデータ •
25億コメント • JSON形式(圧縮:60G 非圧縮:300G) http://goo.gl/FYtO5T
14.
データセット • ニコニコ動画公式データセット • 800万動画のメタデータ •
25億コメント • JSON形式(圧縮:60G 非圧縮:300G) http://goo.gl/FYtO5T
15.
http://goo.gl/FYtO5T
16.
http://goo.gl/FYtO5T
17.
結果 • Elasticsearch on
AWSで4時間でできた • s3 -> unzip -> Elasticsearch(173k doc/s) • 550円
18.
デモ • 25億のコメントをdate facet
19.
20.
21.
22.
23.
24.
25.
install • wget https://download.elasticsearch.org/ elasticsearch/elasticsearch/ elasticsearch-0.90.3.noarch.rpm •
sudo rpm -i elasticsearch-0.90.3.noarch.rpm
26.
install plugins • sudo
bin/plugin • .. -install elasticsearch/elasticsearch-cloud-aws • .. -install mobz/elasticsearch-head • .. -install lukas-vlcek/bigdesk • .. -install elasticsearch/elasticsearch-analysis-kuromoji
27.
elasticsearch-cloud-aws • cluster node
discovery in AWS • add config to elasticsearch.yml cloud: aws: access_key:AKI........... secret_key: mR............. discovery: type: ec2 discovery.ec2.groups: es_test (security_group)
28.
elasticsearch-head
29.
bigdesk
30.
elasticsearch-analysis- kuromoji • japanese analyzer
31.
config • # Set
a custom allowed content length: • http.max_content_length: 1000m • # Heap Size (defaults to 256m min, 1g max) • ES_HEAP_SIZE=3g • # ElasticSearch data directory • DATA_DIR=/media/ephemeral1/es,/media/ephemeral2/ es,/media/ephemeral3/es
32.
make AMI • elasticsearch
machine image
33.
launch ES Instances •
c1.xlarge x 20 • CPU Xeon 8core(2,300MHz) • Memory 7G • Disk 420G x4 • $0.07/hour(spot instance)
34.
35.
• download from
s3 to nodes • use s3cmd(few minutes with GNU Parallel) • unzip(60GB->300GB) deploy data
36.
bulk import { "index"
: { "_id" : "sm14784868 1", "parent": "sm14784868" } } {"date":"2011-06-18T20:15:30+09:00","no":1,"vpos": 63,"comment":"1","command":"184"} ... { "index" : { "_id" : "sm14784868 2", "parent": "sm14784868" } } {"date":"2011-07-24T02:22:58+09:00","no":2,"vpos": 4651,"comment":"2 get","command":"184"}
37.
bulk import • ls
request_file* | parallel -j N curl -X POST -s -D - 'http:// localhost:9200/nico2/comment/_bulk' -o /dev/null --data- binary @{}
38.
39.
wc -l requests >
4.8billion
40.
import... import... import... • all
node can handle indexing request • curl bulk import in each node (x20) • I/O into 3 disks • takes 4hours
41.
efficiency
42.
efficiency "mappings": { "video": { "properties":
{ "video_id": { "type": "string", "index": "no" }, "title": { "type": "string", "index": "analyzed" }, "description": { "type": "string", "index": "analyzed" }, "thumbnail_url": { "type": "string", "index": "no", "store": "yes" }, "upload_time": { "type": "date", "format": "YYYY-MM-dd'T'HH:mm:ss'+09:00'" }, "movie_type": { "type": "string", "index": "not_analyzed" }, "last_res_body": { "type": "string", "index": "analyzed" }, "tags": { "properties": { "tag": { "type": "string", "index": "not_analyzed" } } } } }
43.
efficiency "mappings": { "comment": { "_parent":
{ "type": "video" }, "properties": { "date": { "type": "date", "format": "YYYY-MM-dd'T'HH:mm:ss'+09:00'" }, "no": { "type": "integer" }, "vpos": { "type": "integer" }, "comment": { "type": "string" }, "command": { "type": "string" }, "video_id": { "type": "string", "index": "not_analyzed" } } }
44.
efficiency • curl -X
POST 'http://localhost:9200/nico2' -d @mapping.json
45.
shrink curl -XPOST 'localhost:9200/_cluster/reroute'
-d '{ "commands" : [ { "move" : { "index" : "nico2", "shard" :33, "from_node" : "nodeA", "to_node" : "nodeB" } } ]} '
46.
shrink curl -XPUT localhost:9200/_cluster/settings
-d '{ "persistent": { "indices.recovery.concurrent_streams": 3 }}' curl -XPUT localhost:9200/_cluster/settings -d '{ "persistent": { "indices.recovery.max_bytes_per_sec": "1000mb" }}'
47.
48.
49.
50.
Why Elasticsearch? • proven
scalable search engine • super flexible config with nice default conf • Great API • growing developer, user base
51.
not covered • mapping •
query DSL • search performance • cluster operation • healthcheck / cluster statistics • etc...
52.
questions?
Download now