SlideShare una empresa de Scribd logo
1 de 17
Monitoring tools for
  ElasticSearch
     SF Meetup
     2013.03.06

                  Sushant Shankar
                  Shyam Kuttikkad
• Why and how we use ElasticSearch
• Monitoring
  – Tools
  – Index Building
  – Query Performance
Who is asdfas
• Social Sharing and Content Discovery platform
   – We help >600,000 publishers with content distribution, user
     engagement, and advertising monetization
   – 450 Fortune 1000 brand marketers leverage our unique social signals
     to deliver impactful advertising
• We develop Machine Learning algorithms operating on Big
  Data to:
   – Provide content sharing insights to Publishers
   – Build customized audience segments for advertising campaigns
   – Extract actionable insights out of social and interest data




www.33Across.com
www.tynt.com
Data firehose of 30B monthly
   events, 1.25B cookies
                     - Interaction with web
                     content
                     - Shares – images,
                     copies
                     - Searches

                           Build, understand,
                           analyze
                           Real-time view
                                    ElasticSearch!
                      Social Audiences
                      Behavior
                      Context
                      Knowledge
Production ElasticSearch cluster

Hardware
6 nodes, 24GB RAM
16GB for ES service
4 cores
3x 1.5TB drive

Index                  Build index
>1TB/index             using MR job
(replicated)           and Bulk API
~300M documents
~5KB / document
~3 hours
System monitoring using Zabbix

               Index Build
ElasticSearch specific monitoring
                     using SPM




Scalable Performance Monitoring (http://sematext.com/spm/index.html)
•   Index stats – Total/Refreshed/Merged documents
•   Shards – Total/Active/Relocating/Initializing
•   Search - Request rate and latency
•   Cache – {Filter, field} cache {count, evictions, size}
•   Machine – CPU, Memory, JVM, GC, Network, Disk
Index Building Optimization using
             Zabbix and SPM
Amount bulk indexed




                      Time taken
                       CPU util.
                       Mem util.
                        Disk I/O
                       Network



                                   # Shards
in practice…
Debugging and Validating using SPM
Index Building: Learnings
• 2 shards / CPU
• 10,000 documents (users) per indexing
  request

• Bulk API for our use case
• No replicas
• Refresh off (index.refresh_interval = -1)
Query Performance: Learnings
•   1-2 Replicas (and for reliability)
•   Turn refresh on again (5s default)
•   Warm up effect (Index Warm up API 0.20+)
•   Optimize API
•   Simulate multiple users
QUERIES?
Sushant Shankar
sushant.shankar@33across.com


     Shyam Kuttikkad
shyam.kuttikkad@33across.com
Why we really need a search engine
         Batch! Good for complicated tasks
         (Machine Learning, Graph Algorithms, etc.)




                          …                           …
Warm Up: load into memory and cache
Other cool features
• Custom Scoring functions
• Scripts – MVEL, Python
• Facets

•   Exploring:
•   Real-time indexing
•   Indexing images, files, etc.
•   Parent-child relationships

Más contenido relacionado

Destacado

Applying machine learning to product categorization
Applying machine learning to product categorizationApplying machine learning to product categorization
Applying machine learning to product categorization
Sushant Shankar
 

Destacado (11)

The Automation Factory
The Automation FactoryThe Automation Factory
The Automation Factory
 
Applying machine learning to product categorization
Applying machine learning to product categorizationApplying machine learning to product categorization
Applying machine learning to product categorization
 
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
 
E-commerce product classification with deep learning
E-commerce product classification with deep learning E-commerce product classification with deep learning
E-commerce product classification with deep learning
 
LogStash - Yes, logging can be awesome
LogStash - Yes, logging can be awesomeLogStash - Yes, logging can be awesome
LogStash - Yes, logging can be awesome
 
Down and dirty with Elasticsearch
Down and dirty with ElasticsearchDown and dirty with Elasticsearch
Down and dirty with Elasticsearch
 
Machine Learning with Applications in Categorization, Popularity and Sequence...
Machine Learning with Applications in Categorization, Popularity and Sequence...Machine Learning with Applications in Categorization, Popularity and Sequence...
Machine Learning with Applications in Categorization, Popularity and Sequence...
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
Cassandra+Hadoop
Cassandra+HadoopCassandra+Hadoop
Cassandra+Hadoop
 
Monitoring the ELK stack using Zabbix and Grafana (Dennis Kanbier / 26-11-2015)
Monitoring the ELK stack using Zabbix and Grafana (Dennis Kanbier / 26-11-2015)Monitoring the ELK stack using Zabbix and Grafana (Dennis Kanbier / 26-11-2015)
Monitoring the ELK stack using Zabbix and Grafana (Dennis Kanbier / 26-11-2015)
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 

Similar a SF ElasticSearch Meetup 2013.04.06 - Monitoring

Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
Petter Skodvin-Hvammen
 
Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04
marc_harrison
 
SharePoint 2013 Search Architecture with Russ Houberg
SharePoint 2013  Search Architecture with Russ HoubergSharePoint 2013  Search Architecture with Russ Houberg
SharePoint 2013 Search Architecture with Russ Houberg
knowledgelakemarketing
 
Web search engines and search technology
Web search engines and search technologyWeb search engines and search technology
Web search engines and search technology
Stefanos Anastasiadis
 

Similar a SF ElasticSearch Meetup 2013.04.06 - Monitoring (20)

SF ElasticSearch Meetup 2012.10.03
SF ElasticSearch Meetup 2012.10.03SF ElasticSearch Meetup 2012.10.03
SF ElasticSearch Meetup 2012.10.03
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04
 
Log analysis using Logstash,ElasticSearch and Kibana - Desert Code Camp 2014
Log analysis using Logstash,ElasticSearch and Kibana - Desert Code Camp 2014Log analysis using Logstash,ElasticSearch and Kibana - Desert Code Camp 2014
Log analysis using Logstash,ElasticSearch and Kibana - Desert Code Camp 2014
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and Kibana
 
SharePoint 2013 Search Architecture with Russ Houberg
SharePoint 2013  Search Architecture with Russ HoubergSharePoint 2013  Search Architecture with Russ Houberg
SharePoint 2013 Search Architecture with Russ Houberg
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better Together
 
Log Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaLog Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & Kibana
 
Web search engines and search technology
Web search engines and search technologyWeb search engines and search technology
Web search engines and search technology
 
Traitement d'événements
Traitement d'événementsTraitement d'événements
Traitement d'événements
 
SharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceSharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 Performance
 
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
 
AzureSynapse.pptx
AzureSynapse.pptxAzureSynapse.pptx
AzureSynapse.pptx
 
Real-time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-time Data Exploration and Analytics with Amazon Elasticsearch ServiceReal-time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-time Data Exploration and Analytics with Amazon Elasticsearch Service
 
Thing you didn't know you could do in Spark
Thing you didn't know you could do in SparkThing you didn't know you could do in Spark
Thing you didn't know you could do in Spark
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

SF ElasticSearch Meetup 2013.04.06 - Monitoring

  • 1. Monitoring tools for ElasticSearch SF Meetup 2013.03.06 Sushant Shankar Shyam Kuttikkad
  • 2. • Why and how we use ElasticSearch • Monitoring – Tools – Index Building – Query Performance
  • 3. Who is asdfas • Social Sharing and Content Discovery platform – We help >600,000 publishers with content distribution, user engagement, and advertising monetization – 450 Fortune 1000 brand marketers leverage our unique social signals to deliver impactful advertising • We develop Machine Learning algorithms operating on Big Data to: – Provide content sharing insights to Publishers – Build customized audience segments for advertising campaigns – Extract actionable insights out of social and interest data www.33Across.com www.tynt.com
  • 4. Data firehose of 30B monthly events, 1.25B cookies - Interaction with web content - Shares – images, copies - Searches Build, understand, analyze Real-time view ElasticSearch! Social Audiences Behavior Context Knowledge
  • 5. Production ElasticSearch cluster Hardware 6 nodes, 24GB RAM 16GB for ES service 4 cores 3x 1.5TB drive Index Build index >1TB/index using MR job (replicated) and Bulk API ~300M documents ~5KB / document ~3 hours
  • 6. System monitoring using Zabbix Index Build
  • 7. ElasticSearch specific monitoring using SPM Scalable Performance Monitoring (http://sematext.com/spm/index.html) • Index stats – Total/Refreshed/Merged documents • Shards – Total/Active/Relocating/Initializing • Search - Request rate and latency • Cache – {Filter, field} cache {count, evictions, size} • Machine – CPU, Memory, JVM, GC, Network, Disk
  • 8. Index Building Optimization using Zabbix and SPM Amount bulk indexed Time taken CPU util. Mem util. Disk I/O Network # Shards
  • 11. Index Building: Learnings • 2 shards / CPU • 10,000 documents (users) per indexing request • Bulk API for our use case • No replicas • Refresh off (index.refresh_interval = -1)
  • 12. Query Performance: Learnings • 1-2 Replicas (and for reliability) • Turn refresh on again (5s default) • Warm up effect (Index Warm up API 0.20+) • Optimize API • Simulate multiple users
  • 14. Sushant Shankar sushant.shankar@33across.com Shyam Kuttikkad shyam.kuttikkad@33across.com
  • 15. Why we really need a search engine Batch! Good for complicated tasks (Machine Learning, Graph Algorithms, etc.) … …
  • 16. Warm Up: load into memory and cache
  • 17. Other cool features • Custom Scoring functions • Scripts – MVEL, Python • Facets • Exploring: • Real-time indexing • Indexing images, files, etc. • Parent-child relationships

Notas del editor

  1. http://www.zabbix.com/ - ‘’Enterprise class monitoring solution for everyone’
  2. http://www.zabbix.com/ - ‘’Enterprise class monitoring solution for everyone’
  3. Collect information over 1B users internationally – text copied from over 600K publisher sites, images, searches, pages visitedDifferent slices of data – now!