SlideShare una empresa de Scribd logo
1 de 28
© 2018 PURE STORAGE INC.1 #PUREACCELERATE
#NEWMEETSNOW
RUNNING ELASTICSEARCH ON PURE
BIG DATA AS A SERVICE
Brian Gold | Pure Storage
@briantgold
WHEN, WHY, AND HOW SHOULD
I USE ELASTICSEARCH?
© 2018 PURE STORAGE INC.3
Searching a Large Document Corpus
Find the book containing “It was the best of times, it was the worst of times”
© 2018 PURE STORAGE INC.6
Search Engines: Indexes in the Digital Age
Tokenizing
Filtering
Stemming
Normalizing
Term Document: Locations
voided {‘doc1’: [221]}
toil {‘doc3’: [12]}
house {‘doc2’: [248]}
topics {‘doc1’: [23, 206, 342]}
mandate {‘doc3’: [143]}
edition {‘doc2’: [178]}
job {‘doc1’: [282]}
week {‘doc1’: [22], ‘doc2’: [84]}
buildings {‘doc3’: [832]}
Inverted Index
THE REST IS JUST ENGINEERING ☺
© 2018 PURE STORAGE INC.7
Data shippers
(e.g., crawler)
Library for preprocessing
and indexing text
Documents
AS DATA VOLUMES GROW, WE MUST SCALE OUT
Basic indexing workflow with Apache Lucene
© 2018 PURE STORAGE INC.8
Data shippers
(e.g., crawler)
Documents
Scaling out with Elasticsearch
cluster frontend
Elasticsearch
POWERING SEARCH BACKEND FOR
MAJOR WEBSITES (WIKIPEDIA)
© 2018 PURE STORAGE INC.9
“Data shippers”“Documents”
What’s a document?
cluster frontend
Elasticsearch
Emails
Web pages
Records in a DBMS
Syslog entries
Security logs
…
USE ELASTICSEARCH TO INDEX
ANY KIND OF TEXT
Logstash
Fluentd
Apache Spark
Filebeat
Web crawlers
…
© 2018 PURE STORAGE INC.10
Top use cases
Application search
Business analytics
Log analytics
Security analytics
© 2018 PURE STORAGE INC.11
Common theme: rapid insights into data
© 2018 PURE STORAGE INC.12 #PUREACCELERATE
#NEWMEETSNOW
INFRASTRUCTURE REQUIREMENTS AND BEST PRACTICES
ELASTICSEARCH IN DEPTH
© 2018 PURE STORAGE INC.13
How does Elasticsearch store my data?
Elasticsearch
translog segments Persistent data structures
Ephemeral process
Inverted indexes & associated metadata
Write ahead log
© 2018 PURE STORAGE INC.14
Elasticsearch makes scale-out simple
Elasticsearch
translog segments
Elasticsearch
translog segments
Elasticsearch
translog segments
ElasticSearch cluster - data nodes
© 2018 PURE STORAGE INC.15
Mapping concepts to hardware implementation
Storage (SSDs or disks)
CPU & DRAM
Network Interface
Elasticsearch
© 2018 PURE STORAGE INC.16
Scale out with commodity servers
Elasticsearch Elasticsearch Elasticsearch
Cluster Interconnect
Pros
Excellent scalability
Cons
Deployment requires specific server configuration
Hard to change compute-to-storage ratios
© 2018 PURE STORAGE INC.17
What if we disaggregate?
Elasticsearch Elasticsearch Elasticsearch
Cluster Interconnect
© 2018 PURE STORAGE INC.18
Disaggregation enables Elasticsearch-as-a-service
ES ES ES
Cluster Interconnect
ES ES ESES
Pros
Run nodes anywhere, anyhow
Scale storage and compute independently
Cons
Network bottlenecks?
Storage bottlenecks?
© 2018 PURE STORAGE INC.19
Do not use remote-mounted
storage... The latency introduced
here is antithetical to performance.
“
”- Elastic.co – Indexing Performance Tips
© 2018 PURE STORAGE INC.20
CHALLENGE
ACCEPTED
© 2018 PURE STORAGE INC.21
FLASHBLADEPURPOSE-BUILT FOR MODERN ANALYTICS
BLADE PURITY SCALE-OUT FABRIC
Powerful, Elastic Data
Processing & Storage Unit
Massively Distributed
Software for Limitless Scale
Software-defined fabric that scales
linearly with more data & clients
© 2018 PURE STORAGE INC.22
segment segment
segment segment
segment segment
translog
merge
refresh
Incoming
documents
①
②
③
Why should an all-flash array
perform well with Elasticsearch?
Indexing needs high write throughput
at consistent, low-enough latency
© 2018 PURE STORAGE INC.23
20 Elasticsearch nodes
• 40 vCPUs @ 2.6GHz
• 192 GB DRAM
• 256GB local SSD
• 2x10GbE network
• Elasticsearch 6.2.1
(2 per node in containers)
5 Rally load generators
• 40 vCPUs @ 2.6GHz
• 192 GB DRAM
• 2x10GbE network
• NYC Taxis dataset
8x replicated => 595GB raw
FlashBlade
• 15 blades, 17TB each
• 162TB usable via NFS
• 8x40GbE network
Indexing benchmark methodology
© 2018 PURE STORAGE INC.24
Over 3M documents per second and as fast as local SSD
0
1
2
3
0 5 10 15 20
Number of nodes (2x Elasticsearch instances per node)
Direct-attach SSD
FlashBlade
Indexing throughput
Millions of documents per sec
© 2018 PURE STORAGE INC.25
High throughput at consistently low latency
Real-time monitoring aids performance tuning and diagnostics
© 2018 PURE STORAGE INC.26
① Don’t fear “remote-mounted storage”!
② Do extensive testing at scale
=> Elasticsearch needs high storage throughput
at consistent, low-enough latency
③ Disaggregation enables service delivery and
scaling models not possible with local drives.
Lessons learned
© 2018 PURE STORAGE INC.27
THANK
YOU!
bgold@purestorage.com
@briantgold
© 2017 PURE STORAGE INC.28

Más contenido relacionado

La actualidad más candente

Quick Intro to Google Cloud Technologies
Quick Intro to Google Cloud TechnologiesQuick Intro to Google Cloud Technologies
Quick Intro to Google Cloud TechnologiesChris Schalk
 
LendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGateLendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGateRajit Saha
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricCambridge Semantics
 
Xanadu Based Big Data CBIR System:Automated Astronomical Objects Classificati...
Xanadu Based Big Data CBIR System:Automated Astronomical Objects Classificati...Xanadu Based Big Data CBIR System:Automated Astronomical Objects Classificati...
Xanadu Based Big Data CBIR System:Automated Astronomical Objects Classificati...Alex G. Lee, Ph.D. Esq. CLP
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Imam Raza
 
Making connections with Graph
Making connections with GraphMaking connections with Graph
Making connections with GraphDataStax
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media AnalyticsBig Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media Analyticshafeeznazri
 
The Yellowbrick Impact for MicroStrategy
The Yellowbrick Impact for MicroStrategyThe Yellowbrick Impact for MicroStrategy
The Yellowbrick Impact for MicroStrategyYellowbrick Data
 
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...ieeepondy
 
DMaeda BI Design to Reports
DMaeda BI Design to ReportsDMaeda BI Design to Reports
DMaeda BI Design to ReportsDMaeda
 
Data Structure and Types
Data Structure and TypesData Structure and Types
Data Structure and TypesAnjani Phuyal
 
9 facts about statice's data anonymization solution
9 facts about statice's data anonymization solution9 facts about statice's data anonymization solution
9 facts about statice's data anonymization solutionStatice
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
Ensuring compliance of patient data with big data
Ensuring compliance of patient data with big dataEnsuring compliance of patient data with big data
Ensuring compliance of patient data with big dataAyad Shammout
 
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...Denodo
 
Postgres Vision 2018: Your Migration Path - BinckBank Case Study
Postgres Vision 2018: Your Migration Path - BinckBank Case StudyPostgres Vision 2018: Your Migration Path - BinckBank Case Study
Postgres Vision 2018: Your Migration Path - BinckBank Case StudyEDB
 
How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps Ontotext
 
Mapping the road to better data storage strategies
Mapping the road to better data storage strategiesMapping the road to better data storage strategies
Mapping the road to better data storage strategiesClearSky Data
 
Cloud Developer Days - BigQuery
Cloud Developer Days - BigQueryCloud Developer Days - BigQuery
Cloud Developer Days - BigQueryWlodek Bielski
 
Xanadu Big Data Platform Technology BMT@ Rackspace Cloud
Xanadu Big Data Platform Technology BMT@ Rackspace Cloud Xanadu Big Data Platform Technology BMT@ Rackspace Cloud
Xanadu Big Data Platform Technology BMT@ Rackspace Cloud Alex G. Lee, Ph.D. Esq. CLP
 

La actualidad más candente (20)

Quick Intro to Google Cloud Technologies
Quick Intro to Google Cloud TechnologiesQuick Intro to Google Cloud Technologies
Quick Intro to Google Cloud Technologies
 
LendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGateLendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGate
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
Xanadu Based Big Data CBIR System:Automated Astronomical Objects Classificati...
Xanadu Based Big Data CBIR System:Automated Astronomical Objects Classificati...Xanadu Based Big Data CBIR System:Automated Astronomical Objects Classificati...
Xanadu Based Big Data CBIR System:Automated Astronomical Objects Classificati...
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
 
Making connections with Graph
Making connections with GraphMaking connections with Graph
Making connections with Graph
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media AnalyticsBig Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media Analytics
 
The Yellowbrick Impact for MicroStrategy
The Yellowbrick Impact for MicroStrategyThe Yellowbrick Impact for MicroStrategy
The Yellowbrick Impact for MicroStrategy
 
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
 
DMaeda BI Design to Reports
DMaeda BI Design to ReportsDMaeda BI Design to Reports
DMaeda BI Design to Reports
 
Data Structure and Types
Data Structure and TypesData Structure and Types
Data Structure and Types
 
9 facts about statice's data anonymization solution
9 facts about statice's data anonymization solution9 facts about statice's data anonymization solution
9 facts about statice's data anonymization solution
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Ensuring compliance of patient data with big data
Ensuring compliance of patient data with big dataEnsuring compliance of patient data with big data
Ensuring compliance of patient data with big data
 
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
 
Postgres Vision 2018: Your Migration Path - BinckBank Case Study
Postgres Vision 2018: Your Migration Path - BinckBank Case StudyPostgres Vision 2018: Your Migration Path - BinckBank Case Study
Postgres Vision 2018: Your Migration Path - BinckBank Case Study
 
How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps
 
Mapping the road to better data storage strategies
Mapping the road to better data storage strategiesMapping the road to better data storage strategies
Mapping the road to better data storage strategies
 
Cloud Developer Days - BigQuery
Cloud Developer Days - BigQueryCloud Developer Days - BigQuery
Cloud Developer Days - BigQuery
 
Xanadu Big Data Platform Technology BMT@ Rackspace Cloud
Xanadu Big Data Platform Technology BMT@ Rackspace Cloud Xanadu Big Data Platform Technology BMT@ Rackspace Cloud
Xanadu Big Data Platform Technology BMT@ Rackspace Cloud
 

Similar a Data Con LA 2018 - Big Data as a Service: Running Elasticsearch on Pure by Brian Gold

Hybrid solutions – combining in memory solutions with SSD - Christos Erotocritou
Hybrid solutions – combining in memory solutions with SSD - Christos ErotocritouHybrid solutions – combining in memory solutions with SSD - Christos Erotocritou
Hybrid solutions – combining in memory solutions with SSD - Christos ErotocritouJAXLondon_Conference
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteCarlos Andrés García
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteVMware Tanzu
 
Postgres Vision 2018: Taking Postgres Everywhere
Postgres Vision 2018: Taking Postgres EverywherePostgres Vision 2018: Taking Postgres Everywhere
Postgres Vision 2018: Taking Postgres EverywhereEDB
 
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心NVIDIA Taiwan
 
Analyzing application activities with KSQL and Elasticsearch
Analyzing application activities with KSQL and ElasticsearchAnalyzing application activities with KSQL and Elasticsearch
Analyzing application activities with KSQL and ElasticsearchKatherine Golovinova
 
Postgres Takes Charge Around the World
Postgres Takes Charge Around the WorldPostgres Takes Charge Around the World
Postgres Takes Charge Around the WorldEDB
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAmazon Web Services
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
Global Legal Discovery - A case study in storage solutions by Silicon Mechanics
Global Legal Discovery - A case study in storage solutions by Silicon MechanicsGlobal Legal Discovery - A case study in storage solutions by Silicon Mechanics
Global Legal Discovery - A case study in storage solutions by Silicon Mechanicswaltermoss123
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
 
TT-Kier-CaseStudy_1-1(1)
TT-Kier-CaseStudy_1-1(1)TT-Kier-CaseStudy_1-1(1)
TT-Kier-CaseStudy_1-1(1)James Hind
 
Elastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and actionElastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and actionElasticsearch
 
In-Memory Computing Driving Edge Computing and Blockchain Technologies
In-Memory Computing Driving Edge Computing and Blockchain TechnologiesIn-Memory Computing Driving Edge Computing and Blockchain Technologies
In-Memory Computing Driving Edge Computing and Blockchain Technologiesdsapps
 
Effective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson DarkwahEffective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson DarkwahMatěj Jakimov
 
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020Aerospike
 
AWS Compute Leadership Session: What’s New in Amazon EC2, Containers, and Ser...
AWS Compute Leadership Session: What’s New in Amazon EC2, Containers, and Ser...AWS Compute Leadership Session: What’s New in Amazon EC2, Containers, and Ser...
AWS Compute Leadership Session: What’s New in Amazon EC2, Containers, and Ser...Amazon Web Services
 
In-Memory Stream Processing with Hazelcast Jet @MorningAtLohika
In-Memory Stream Processing with Hazelcast Jet @MorningAtLohikaIn-Memory Stream Processing with Hazelcast Jet @MorningAtLohika
In-Memory Stream Processing with Hazelcast Jet @MorningAtLohikaNazarii Cherkas
 
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Amazon Web Services
 

Similar a Data Con LA 2018 - Big Data as a Service: Running Elasticsearch on Pure by Brian Gold (20)

Hybrid solutions – combining in memory solutions with SSD - Christos Erotocritou
Hybrid solutions – combining in memory solutions with SSD - Christos ErotocritouHybrid solutions – combining in memory solutions with SSD - Christos Erotocritou
Hybrid solutions – combining in memory solutions with SSD - Christos Erotocritou
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
 
Postgres Vision 2018: Taking Postgres Everywhere
Postgres Vision 2018: Taking Postgres EverywherePostgres Vision 2018: Taking Postgres Everywhere
Postgres Vision 2018: Taking Postgres Everywhere
 
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
 
Analyzing application activities with KSQL and Elasticsearch
Analyzing application activities with KSQL and ElasticsearchAnalyzing application activities with KSQL and Elasticsearch
Analyzing application activities with KSQL and Elasticsearch
 
Postgres Takes Charge Around the World
Postgres Takes Charge Around the WorldPostgres Takes Charge Around the World
Postgres Takes Charge Around the World
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Global Legal Discovery - A case study in storage solutions by Silicon Mechanics
Global Legal Discovery - A case study in storage solutions by Silicon MechanicsGlobal Legal Discovery - A case study in storage solutions by Silicon Mechanics
Global Legal Discovery - A case study in storage solutions by Silicon Mechanics
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
TT-Kier-CaseStudy_1-1(1)
TT-Kier-CaseStudy_1-1(1)TT-Kier-CaseStudy_1-1(1)
TT-Kier-CaseStudy_1-1(1)
 
Elastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and actionElastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and action
 
In-Memory Computing Driving Edge Computing and Blockchain Technologies
In-Memory Computing Driving Edge Computing and Blockchain TechnologiesIn-Memory Computing Driving Edge Computing and Blockchain Technologies
In-Memory Computing Driving Edge Computing and Blockchain Technologies
 
Effective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson DarkwahEffective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson Darkwah
 
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020Aerospike Meetup - Introduction - Ami - 04 March 2020
Aerospike Meetup - Introduction - Ami - 04 March 2020
 
AWS Compute Leadership Session: What’s New in Amazon EC2, Containers, and Ser...
AWS Compute Leadership Session: What’s New in Amazon EC2, Containers, and Ser...AWS Compute Leadership Session: What’s New in Amazon EC2, Containers, and Ser...
AWS Compute Leadership Session: What’s New in Amazon EC2, Containers, and Ser...
 
In-Memory Stream Processing with Hazelcast Jet @MorningAtLohika
In-Memory Stream Processing with Hazelcast Jet @MorningAtLohikaIn-Memory Stream Processing with Hazelcast Jet @MorningAtLohika
In-Memory Stream Processing with Hazelcast Jet @MorningAtLohika
 
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
 

Más de Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

Más de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 

Último (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Data Con LA 2018 - Big Data as a Service: Running Elasticsearch on Pure by Brian Gold

  • 1. © 2018 PURE STORAGE INC.1 #PUREACCELERATE #NEWMEETSNOW RUNNING ELASTICSEARCH ON PURE BIG DATA AS A SERVICE Brian Gold | Pure Storage @briantgold
  • 2. WHEN, WHY, AND HOW SHOULD I USE ELASTICSEARCH?
  • 3. © 2018 PURE STORAGE INC.3 Searching a Large Document Corpus Find the book containing “It was the best of times, it was the worst of times”
  • 4.
  • 5.
  • 6. © 2018 PURE STORAGE INC.6 Search Engines: Indexes in the Digital Age Tokenizing Filtering Stemming Normalizing Term Document: Locations voided {‘doc1’: [221]} toil {‘doc3’: [12]} house {‘doc2’: [248]} topics {‘doc1’: [23, 206, 342]} mandate {‘doc3’: [143]} edition {‘doc2’: [178]} job {‘doc1’: [282]} week {‘doc1’: [22], ‘doc2’: [84]} buildings {‘doc3’: [832]} Inverted Index THE REST IS JUST ENGINEERING ☺
  • 7. © 2018 PURE STORAGE INC.7 Data shippers (e.g., crawler) Library for preprocessing and indexing text Documents AS DATA VOLUMES GROW, WE MUST SCALE OUT Basic indexing workflow with Apache Lucene
  • 8. © 2018 PURE STORAGE INC.8 Data shippers (e.g., crawler) Documents Scaling out with Elasticsearch cluster frontend Elasticsearch POWERING SEARCH BACKEND FOR MAJOR WEBSITES (WIKIPEDIA)
  • 9. © 2018 PURE STORAGE INC.9 “Data shippers”“Documents” What’s a document? cluster frontend Elasticsearch Emails Web pages Records in a DBMS Syslog entries Security logs … USE ELASTICSEARCH TO INDEX ANY KIND OF TEXT Logstash Fluentd Apache Spark Filebeat Web crawlers …
  • 10. © 2018 PURE STORAGE INC.10 Top use cases Application search Business analytics Log analytics Security analytics
  • 11. © 2018 PURE STORAGE INC.11 Common theme: rapid insights into data
  • 12. © 2018 PURE STORAGE INC.12 #PUREACCELERATE #NEWMEETSNOW INFRASTRUCTURE REQUIREMENTS AND BEST PRACTICES ELASTICSEARCH IN DEPTH
  • 13. © 2018 PURE STORAGE INC.13 How does Elasticsearch store my data? Elasticsearch translog segments Persistent data structures Ephemeral process Inverted indexes & associated metadata Write ahead log
  • 14. © 2018 PURE STORAGE INC.14 Elasticsearch makes scale-out simple Elasticsearch translog segments Elasticsearch translog segments Elasticsearch translog segments ElasticSearch cluster - data nodes
  • 15. © 2018 PURE STORAGE INC.15 Mapping concepts to hardware implementation Storage (SSDs or disks) CPU & DRAM Network Interface Elasticsearch
  • 16. © 2018 PURE STORAGE INC.16 Scale out with commodity servers Elasticsearch Elasticsearch Elasticsearch Cluster Interconnect Pros Excellent scalability Cons Deployment requires specific server configuration Hard to change compute-to-storage ratios
  • 17. © 2018 PURE STORAGE INC.17 What if we disaggregate? Elasticsearch Elasticsearch Elasticsearch Cluster Interconnect
  • 18. © 2018 PURE STORAGE INC.18 Disaggregation enables Elasticsearch-as-a-service ES ES ES Cluster Interconnect ES ES ESES Pros Run nodes anywhere, anyhow Scale storage and compute independently Cons Network bottlenecks? Storage bottlenecks?
  • 19. © 2018 PURE STORAGE INC.19 Do not use remote-mounted storage... The latency introduced here is antithetical to performance. “ ”- Elastic.co – Indexing Performance Tips
  • 20. © 2018 PURE STORAGE INC.20 CHALLENGE ACCEPTED
  • 21. © 2018 PURE STORAGE INC.21 FLASHBLADEPURPOSE-BUILT FOR MODERN ANALYTICS BLADE PURITY SCALE-OUT FABRIC Powerful, Elastic Data Processing & Storage Unit Massively Distributed Software for Limitless Scale Software-defined fabric that scales linearly with more data & clients
  • 22. © 2018 PURE STORAGE INC.22 segment segment segment segment segment segment translog merge refresh Incoming documents ① ② ③ Why should an all-flash array perform well with Elasticsearch? Indexing needs high write throughput at consistent, low-enough latency
  • 23. © 2018 PURE STORAGE INC.23 20 Elasticsearch nodes • 40 vCPUs @ 2.6GHz • 192 GB DRAM • 256GB local SSD • 2x10GbE network • Elasticsearch 6.2.1 (2 per node in containers) 5 Rally load generators • 40 vCPUs @ 2.6GHz • 192 GB DRAM • 2x10GbE network • NYC Taxis dataset 8x replicated => 595GB raw FlashBlade • 15 blades, 17TB each • 162TB usable via NFS • 8x40GbE network Indexing benchmark methodology
  • 24. © 2018 PURE STORAGE INC.24 Over 3M documents per second and as fast as local SSD 0 1 2 3 0 5 10 15 20 Number of nodes (2x Elasticsearch instances per node) Direct-attach SSD FlashBlade Indexing throughput Millions of documents per sec
  • 25. © 2018 PURE STORAGE INC.25 High throughput at consistently low latency Real-time monitoring aids performance tuning and diagnostics
  • 26. © 2018 PURE STORAGE INC.26 ① Don’t fear “remote-mounted storage”! ② Do extensive testing at scale => Elasticsearch needs high storage throughput at consistent, low-enough latency ③ Disaggregation enables service delivery and scaling models not possible with local drives. Lessons learned
  • 27. © 2018 PURE STORAGE INC.27 THANK YOU! bgold@purestorage.com @briantgold
  • 28. © 2017 PURE STORAGE INC.28