SlideShare a Scribd company logo
1 of 42
www.objectrocket.com
Exploring MongoDB and
Elasticsearch
DeveloperWeek Austin 2017
Kimberly Wilkins
Principal Engineer
Databases
@dba_denizen
/wilkinskimberly
www.objectrocket.com
Current Areas of Interest
• NoSQL – MongoDB, Elasticsearch, etc.
• Streaming, real-time analytics
• AR/VR/MR – Augmented, Virtual and
Mixed Reality technologies
• Machine Learning – Deep Learning
• Cryptocurrencies, Blockchain
• Teaching, helping, raising up others
www.objectrocket.com
MongoDB &
Elasticsearch
Better Together? Yes!
www.objectrocket.com
Overview
• Definitions
• Current versions
• Features
• Architectural basics
• Use cases:
Best, Worst, Together
Squirrel
www.objectrocket.com
Why Do It?
The blue data highway… bulging at the seams.
www.objectrocket.com
So Many Forms… As Many Impacts
New technologies, new industries, new uses…
www.objectrocket.com
Data is Coming From Everywhere
Sensors, IoT
www.objectrocket.com
Data is Coming From Everywhere
“Big data is like teenage sex:
everyone talks about it,
nobody really knows how to
do it, everyone thinks
everyone else is doing it, so
everyone claims they are
doing it…”
-Dan Ariely, Duke University
www.objectrocket.com
Remember
• Hold the data
• Find the data fast
• Stream the data between data stores
• Process the data along the way
• Analyze the data
• Understand where the data comes from
www.objectrocket.com
Why?
• Faster, more flexible development
• Lower $ (hardware, software, deployment)
• Performance (faster writes, faster reads)
• Developers (“Schemaless”, cool toys)
• > dev’s than ^ dba’s, devops, SRE’s…
• Variety of NoSQL technologies
www.objectrocket.com
MongoDB &
Elasticsearch
Better Together? Yes!
www.objectrocket.com
MongoDB
"MongoDB (from humongous) is a free and open-source
cross-platform document-oriented database program.
Classified as a NoSQL database program, MongoDB
uses JSON-like documents with schemas.”
– straight from wikipedia
• #1 NoSQL
• #5 Overall
www.objectrocket.com
Features: MongoDB
Document store
collections vs tables; document or objectId’s
Easy for developers – more devs than DBA’s and Ops
flexible data types
Unstructured & structured data
De-normalized
Duplicate data is OK
Index intersections, partials, aggregation pipelines - $lookup
improvements coming in 3.6 *Nov–single db call; updating arrays
Scales vertically or horizontally - sharding
www.objectrocket.com
MongoDB Architectural Basics
• Faster, more flexible development
• Built-in Replication via Replica sets
• HA/DR throughout stack, components
• Scaling via Sharding
• DR via use of Multiple Data Centers
• Delayed and/or Hidden Slaves
• https://www.objectrocket.com/files/objectrocket-for-
mongodb-white-paper.pdf
www.objectrocket.com
Basic MongoDB Architecture
Primary
Secondary Secondary
Heartbeat
Single Replica Set
www.objectrocket.com
Shard 1
Secondary
Secondary
Primary
Shard 2
Secondary
Secondary
Primary
Shard 3
Secondary
Secondary
Primary
Client Drivers
MongoS Tier
(Router)
MongoD Tier Replica Sets
MongoS MongoS MongoS
Config Servers
(Metadata)
Config 3
Config 1
Config 2
Replica Set 3.2
Sharded Cluster
MongoS
www.objectrocket.com
MongoDB Architecture - Advanced
• Multiple Storage Engine Options
• HA/DR throughout stack, components
• Scaling via Sharding
• DR via use of Multiple Data Centers,
delayed/hidden
• Percona Server Edition - has features from
MongoDB Enterprise edition* Security
www.objectrocket.com
Best Use Cases
• User Data - games, chat, social media
• Mobile Analytics, Engagement/Campaigns
• Aggregation Summaries
• Product Catalogs
• Inventory Management
• Shopping Carts
• Content Management Systems - Sitecore
1000 x
www.objectrocket.com
Elasticsearch
www.objectrocket.com
Elasticsearch
“Elasticsearch is a distributed, JSON-
based search and analytics engine
designed for horizontal scalability,
maximum reliability, and easy
management.”
– straight from Elastic.co website
www.objectrocket.com
Best Use Cases
● Cluster - A collection of Elasticsearch nodes of
various roles
↳ Nodes - Elasticsearch processes that perform one or more roles
● Roles are: master, data, ingest, coordinating-only (client)
● Nodes can operate in any combination or all roles
↳ Indexes - A collection of data (like databases/collections)
● Can be combined in queries with wildcards and aliases
● Fields in an index have an unchangeable data type (mapping)
↳ Shards - Slices of the index data
● Unlike many databases, automatically constructed (not key based)
● A replica is just a readonly copy of a shard
↳ Segments - Lucene’s chunk of data
● Automatically built as data is indexed.
● Docs are not deleted, just marked as deleted (can be
optimized/merged)
↳ Documents - A JSON entry in the index
www.objectrocket.com
Elasticsearch vs. Elastic Stack
• Don’t be confused!
• Elasticsearch vs. Elastic Stack
• The Open Source Elastic Stack is a suite of
tools/apps associated with and working in
conjunction with Elasticsearch to complete a variety
of analytics tasks.
www.objectrocket.com
Elastic Stack Ecosystem
www.objectrocket.com
Basic Elastic Architecture
3 Nodes 1 Replica, 1 master-Master –fewer nodes, more resources
per node, each shard performs better
3 Nodes 2 Replicas, 1 master-Master – more nodes, needs more
HW resources but increases search performance for the index and
improves redundancy
www.objectrocket.com
Best Use Cases
• Full and Fuzzy Text Searches **true strength speed
• Geo and Range related searches
• Visualizing Data – with other ES Stack
Components- Kibana
• Logging and Log Analysis xsplunkx
• Scraping and Combining Public Data Sources
• Event and Data Metrics
www.objectrocket.com
Geo Queries – Social Media – Near Me
www.objectrocket.com
Visualization with Kibana
www.objectrocket.com
Visualization with Kibana
MongoDB Elastic (Elasticsearch)
General Purpose Document store DB, server side scripts,
some aggreg pipelines
OLTP = good, REPORTING = not as good
Simple = good, Complex = good, Very Complex = not as good
Full-text search engine, Fuzzy text search, geo near,
keyword, real-time analytics, indexer, distributed , java
based w/Lucene under the covers
Current version: 3.4.10 *Halloween!
Recommended: 3.4.8 or 3.4.9
Current version: 5.6.1 September 18, 2017 *New, kinks from
5.5.3 release from September 11, 2017
Recommended and Available 5.5.1 July 25, 2017
Schemaless **#! Structured, unstructured, semi-structured Schemaless **#! Structured, unstructured, semi-structured
JSON, BSON docs JSON
Sharding to scale Sharding/Nodes to scale
HA via replica sets
(1 Primary, 2 Secondaries – or more with quorum)
HA via replica sets
(1 MASTER, x REPLICAS)
Limited index intersection v2.6+, very large indexes still ehh 1 Query can use multiple indexes
Great general purpose NoSQL db, for Processing, filtering
during query & data retrieval
Processing via index builds, stores in multiple versions.
Great at Indexing; Great at searching big datasets
www.objectrocket.com
Now Combine Them
Like tacos
and tequila
www.objectrocket.com
Combining – in general
• Database >>many indexes or very large indexes
• Data has lots of arrays - to perform queries that
required many different $and clauses on an field
with an array as a value
• SPEED up fuzzy and/or full text searches – ‘chicken’
ex. db.articles.find({ $text: { $search: "chi" } }
www.objectrocket.com
MongoDB & Elasticsearch +
Primarily Search Engine
Scalable, distributed
Horizontal scaling
JSON
Schemaless*
Based on Lucene
Support for Python, JS, .Net,
Scala, Perl, php, Ruby
3rd Party Product Integration
Primarily for Streaming, for
moving data between data
stores, used with other
components and data techs
to create near real time and
very near real time event
analytics, append only,
Horizontal scaling
JSON
Schemaless*
Parallel Processing
3rd Party Product Integration
Primarily OLTP
Scalable, distributed
Verticle or Horizontal
scaling
Binary JSON
Schemaless*
Rapid prototyping
Event Logging
Social Media
Content management
User Data and Actions
NOT in-depth analysis
MongoDB
Elasticsearch
Kafka, others
www.objectrocket.com
MongoDB & Elasticsearch @ObjectRocket
MongoDB
metrics
Centralized
Logging
MongoDB data
visualization Network
monitoring
Website search
Business
Metrics
Elasticsearch metrics
Currently
www.objectrocket.com
Potential New Use 1 – Bitcoin Time Interval Tracking
Bitcoin ticker data Interval Tracking and Analysis….
MongoDB
• Simple and Complex
Queries
• Aggregations at any
stage
Elasticsearch
• Speed up queries –
faster results
• Store frequent queries
for re-use via indexes
www.objectrocket.com
Potential New Use 1 cont’d – Bitcoin Time Interval Tracking
www.objectrocket.com
Potential New Use 2 – Cryptocurrency Platform/Trading
• Crytpocurrency Trading Platform - ex. tribeca
• node.js – v7.8 or higher
• MongoDB database – for persistence, aggregations
• Elasticsearch – the ‘need for speed’ rapid-fire
executions required – sub millisecond trades & cancellations
www.objectrocket.com
Potential New Use 3 – Social Media App Searching
• Searching large Social Media Apps for frequently
searched items – popular quarterbacks & receivers
on fantasy football sites, wines in comments
• MongoDB’s $text operator is special - cannot be
used more than once in a query; no use with $nor,
etc.
ex. db.comments.find({ $and: [{$text: { $search: ”win"
},{$text: {$search: “red” }}]}) – WON’T WORK!
In MongoDB but combine it.
www.objectrocket.com
Potential New Use 4 – Machine Learning, Deep Learning
www.objectrocket.com
Potential New Use 4 – Machine Learning, Deep Learning
Architecture and Streaming
Platform – Jay Kreps
• Apps/DB’s->data in
• Aggregations at any stage
• Further Queries
• Faster Queries via ES
• Results back into DB’s
• Algorithms applied
• Endless … Limitless …
Device events, time series,
event logs, AR/VR/MR
www.objectrocket.com
Links
• MongoDB to Analyze cryptocurrency price swings and intervals:
https://medium.com/@serbanmihai/aggregate-mongodb-data-with-node-js-and-mongoose-
cryptocurrency-financial-time-series-ae739b4c9485
• MongoDB with node.js – Cryptocurrency trading platform:
https://github.com/michaelgrosner/tribeca
• Arctic MongoDN and Python – Cryptocurrency Database:
https://mxbu.github.io/logbook/2017/06/04/use-arctic-to-create-cryptocurrency-database/
• AI MI DL - Jay Kreps article Architecture and Streaming Platform for AI Deep Learning
Database Pipeline Models Events etc.:
• https://www.oreilly.com/ideas/apache-kafka-and-the-four-challenges-of-production-machine-
learning-systems
www.objectrocket.com
We are Hiring!
Join a dynamic and
innovative team!
objectrocket.com/careers
www.objectrocket.com
Consultations Available
sales@objectrocket.com
objectrocket.com/customers/
View Customer Stories
Trial & Migrations
always free
objectrocket.com
www.objectrocket.com
Thank You!
DeveloperWeek Austin 2017
Kimberly Wilkins
Principal Engineer
Databases
@dba_denizen
/wilkinskimberly

More Related Content

What's hot

Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data Presentation
Matthew Urdan
 
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Spark Summit
 

What's hot (20)

Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data Presentation
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
 
SQL & NoSQL
SQL & NoSQLSQL & NoSQL
SQL & NoSQL
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Introduction to tensorflow
Introduction to tensorflowIntroduction to tensorflow
Introduction to tensorflow
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
 
Big data
Big dataBig data
Big data
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
Threats to federated learning a survey
Threats to federated learning  a surveyThreats to federated learning  a survey
Threats to federated learning a survey
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning Explained
 
Poisoning attacks on Federated Learning based IoT Intrusion Detection System
Poisoning attacks on Federated Learning based IoT Intrusion Detection SystemPoisoning attacks on Federated Learning based IoT Intrusion Detection System
Poisoning attacks on Federated Learning based IoT Intrusion Detection System
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Deep learning
Deep learningDeep learning
Deep learning
 
Python+numpy pandas 1편
Python+numpy pandas 1편Python+numpy pandas 1편
Python+numpy pandas 1편
 
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
 

Viewers also liked

Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ivan Zoratti
 

Viewers also liked (20)

Sharding using MySQL and PHP
Sharding using MySQL and PHPSharding using MySQL and PHP
Sharding using MySQL and PHP
 
Building Scalable High Availability Systems using MySQL Fabric
Building Scalable High Availability Systems using MySQL FabricBuilding Scalable High Availability Systems using MySQL Fabric
Building Scalable High Availability Systems using MySQL Fabric
 
MySQL Enterprise Cloud
MySQL Enterprise Cloud MySQL Enterprise Cloud
MySQL Enterprise Cloud
 
MEAN Stack
MEAN StackMEAN Stack
MEAN Stack
 
[스마트스터디]MongoDB 의 역습
[스마트스터디]MongoDB 의 역습[스마트스터디]MongoDB 의 역습
[스마트스터디]MongoDB 의 역습
 
SunshinePHP 2017 - Making the most out of MySQL
SunshinePHP 2017 - Making the most out of MySQLSunshinePHP 2017 - Making the most out of MySQL
SunshinePHP 2017 - Making the most out of MySQL
 
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
 
20171104 hk-py con-mysql-documentstore_v1
20171104 hk-py con-mysql-documentstore_v120171104 hk-py con-mysql-documentstore_v1
20171104 hk-py con-mysql-documentstore_v1
 
MySQL 5.7 - 
Tirando o Máximo Proveito
MySQL 5.7 - 
Tirando o Máximo ProveitoMySQL 5.7 - 
Tirando o Máximo Proveito
MySQL 5.7 - 
Tirando o Máximo Proveito
 
LAMP: Desenvolvendo além do trivial
LAMP: Desenvolvendo além do trivialLAMP: Desenvolvendo além do trivial
LAMP: Desenvolvendo além do trivial
 
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
 
Strip your TEXT fields
Strip your TEXT fieldsStrip your TEXT fields
Strip your TEXT fields
 
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal ScalingMySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
 
Coding like a girl - DjangoCon
Coding like a girl - DjangoConCoding like a girl - DjangoCon
Coding like a girl - DjangoCon
 
Strip your TEXT fields - Exeter Web Feb/2016
Strip your TEXT fields - Exeter Web Feb/2016Strip your TEXT fields - Exeter Web Feb/2016
Strip your TEXT fields - Exeter Web Feb/2016
 
Mongodb
MongodbMongodb
Mongodb
 
The MySQL Server Ecosystem in 2016
The MySQL Server Ecosystem in 2016The MySQL Server Ecosystem in 2016
The MySQL Server Ecosystem in 2016
 
MySQL Cluster Whats New
MySQL Cluster Whats NewMySQL Cluster Whats New
MySQL Cluster Whats New
 
LaravelSP - MySQL 5.7: introdução ao JSON Data Type
LaravelSP - MySQL 5.7: introdução ao JSON Data TypeLaravelSP - MySQL 5.7: introdução ao JSON Data Type
LaravelSP - MySQL 5.7: introdução ao JSON Data Type
 
Laravel 5 and SOLID
Laravel 5 and SOLIDLaravel 5 and SOLID
Laravel 5 and SOLID
 

Similar to Exploring MongoDB & Elasticsearch: Better Together

Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
Minsk MongoDB User Group
 

Similar to Exploring MongoDB & Elasticsearch: Better Together (20)

No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
 
MongoDB meetup at Hike
MongoDB meetup at HikeMongoDB meetup at Hike
MongoDB meetup at Hike
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Elasticsearch 5.0
Elasticsearch 5.0Elasticsearch 5.0
Elasticsearch 5.0
 
Mongo DB
Mongo DB Mongo DB
Mongo DB
 
Drop acid
Drop acidDrop acid
Drop acid
 
MongoDB
MongoDBMongoDB
MongoDB
 
mongodb_DS.pptx
mongodb_DS.pptxmongodb_DS.pptx
mongodb_DS.pptx
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
Elasticsearch vs MongoDB comparison
Elasticsearch vs MongoDB comparisonElasticsearch vs MongoDB comparison
Elasticsearch vs MongoDB comparison
 
The What and Why of NoSql
The What and Why of NoSqlThe What and Why of NoSql
The What and Why of NoSql
 
DevCon Summit 2014 #DevelopersUnitePH: The "What" and "Why" of NoSQL by Matia...
DevCon Summit 2014 #DevelopersUnitePH: The "What" and "Why" of NoSQL by Matia...DevCon Summit 2014 #DevelopersUnitePH: The "What" and "Why" of NoSQL by Matia...
DevCon Summit 2014 #DevelopersUnitePH: The "What" and "Why" of NoSQL by Matia...
 
Scaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTPScaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTP
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Exploring MongoDB & Elasticsearch: Better Together

  • 1. www.objectrocket.com Exploring MongoDB and Elasticsearch DeveloperWeek Austin 2017 Kimberly Wilkins Principal Engineer Databases @dba_denizen /wilkinskimberly
  • 2. www.objectrocket.com Current Areas of Interest • NoSQL – MongoDB, Elasticsearch, etc. • Streaming, real-time analytics • AR/VR/MR – Augmented, Virtual and Mixed Reality technologies • Machine Learning – Deep Learning • Cryptocurrencies, Blockchain • Teaching, helping, raising up others
  • 4. www.objectrocket.com Overview • Definitions • Current versions • Features • Architectural basics • Use cases: Best, Worst, Together Squirrel
  • 5. www.objectrocket.com Why Do It? The blue data highway… bulging at the seams.
  • 6. www.objectrocket.com So Many Forms… As Many Impacts New technologies, new industries, new uses…
  • 7. www.objectrocket.com Data is Coming From Everywhere Sensors, IoT
  • 8. www.objectrocket.com Data is Coming From Everywhere “Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…” -Dan Ariely, Duke University
  • 9. www.objectrocket.com Remember • Hold the data • Find the data fast • Stream the data between data stores • Process the data along the way • Analyze the data • Understand where the data comes from
  • 10. www.objectrocket.com Why? • Faster, more flexible development • Lower $ (hardware, software, deployment) • Performance (faster writes, faster reads) • Developers (“Schemaless”, cool toys) • > dev’s than ^ dba’s, devops, SRE’s… • Variety of NoSQL technologies
  • 12. www.objectrocket.com MongoDB "MongoDB (from humongous) is a free and open-source cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with schemas.” – straight from wikipedia • #1 NoSQL • #5 Overall
  • 13. www.objectrocket.com Features: MongoDB Document store collections vs tables; document or objectId’s Easy for developers – more devs than DBA’s and Ops flexible data types Unstructured & structured data De-normalized Duplicate data is OK Index intersections, partials, aggregation pipelines - $lookup improvements coming in 3.6 *Nov–single db call; updating arrays Scales vertically or horizontally - sharding
  • 14. www.objectrocket.com MongoDB Architectural Basics • Faster, more flexible development • Built-in Replication via Replica sets • HA/DR throughout stack, components • Scaling via Sharding • DR via use of Multiple Data Centers • Delayed and/or Hidden Slaves • https://www.objectrocket.com/files/objectrocket-for- mongodb-white-paper.pdf
  • 16. www.objectrocket.com Shard 1 Secondary Secondary Primary Shard 2 Secondary Secondary Primary Shard 3 Secondary Secondary Primary Client Drivers MongoS Tier (Router) MongoD Tier Replica Sets MongoS MongoS MongoS Config Servers (Metadata) Config 3 Config 1 Config 2 Replica Set 3.2 Sharded Cluster MongoS
  • 17. www.objectrocket.com MongoDB Architecture - Advanced • Multiple Storage Engine Options • HA/DR throughout stack, components • Scaling via Sharding • DR via use of Multiple Data Centers, delayed/hidden • Percona Server Edition - has features from MongoDB Enterprise edition* Security
  • 18. www.objectrocket.com Best Use Cases • User Data - games, chat, social media • Mobile Analytics, Engagement/Campaigns • Aggregation Summaries • Product Catalogs • Inventory Management • Shopping Carts • Content Management Systems - Sitecore 1000 x
  • 20. www.objectrocket.com Elasticsearch “Elasticsearch is a distributed, JSON- based search and analytics engine designed for horizontal scalability, maximum reliability, and easy management.” – straight from Elastic.co website
  • 21. www.objectrocket.com Best Use Cases ● Cluster - A collection of Elasticsearch nodes of various roles ↳ Nodes - Elasticsearch processes that perform one or more roles ● Roles are: master, data, ingest, coordinating-only (client) ● Nodes can operate in any combination or all roles ↳ Indexes - A collection of data (like databases/collections) ● Can be combined in queries with wildcards and aliases ● Fields in an index have an unchangeable data type (mapping) ↳ Shards - Slices of the index data ● Unlike many databases, automatically constructed (not key based) ● A replica is just a readonly copy of a shard ↳ Segments - Lucene’s chunk of data ● Automatically built as data is indexed. ● Docs are not deleted, just marked as deleted (can be optimized/merged) ↳ Documents - A JSON entry in the index
  • 22. www.objectrocket.com Elasticsearch vs. Elastic Stack • Don’t be confused! • Elasticsearch vs. Elastic Stack • The Open Source Elastic Stack is a suite of tools/apps associated with and working in conjunction with Elasticsearch to complete a variety of analytics tasks.
  • 24. www.objectrocket.com Basic Elastic Architecture 3 Nodes 1 Replica, 1 master-Master –fewer nodes, more resources per node, each shard performs better 3 Nodes 2 Replicas, 1 master-Master – more nodes, needs more HW resources but increases search performance for the index and improves redundancy
  • 25. www.objectrocket.com Best Use Cases • Full and Fuzzy Text Searches **true strength speed • Geo and Range related searches • Visualizing Data – with other ES Stack Components- Kibana • Logging and Log Analysis xsplunkx • Scraping and Combining Public Data Sources • Event and Data Metrics
  • 26. www.objectrocket.com Geo Queries – Social Media – Near Me
  • 28. www.objectrocket.com Visualization with Kibana MongoDB Elastic (Elasticsearch) General Purpose Document store DB, server side scripts, some aggreg pipelines OLTP = good, REPORTING = not as good Simple = good, Complex = good, Very Complex = not as good Full-text search engine, Fuzzy text search, geo near, keyword, real-time analytics, indexer, distributed , java based w/Lucene under the covers Current version: 3.4.10 *Halloween! Recommended: 3.4.8 or 3.4.9 Current version: 5.6.1 September 18, 2017 *New, kinks from 5.5.3 release from September 11, 2017 Recommended and Available 5.5.1 July 25, 2017 Schemaless **#! Structured, unstructured, semi-structured Schemaless **#! Structured, unstructured, semi-structured JSON, BSON docs JSON Sharding to scale Sharding/Nodes to scale HA via replica sets (1 Primary, 2 Secondaries – or more with quorum) HA via replica sets (1 MASTER, x REPLICAS) Limited index intersection v2.6+, very large indexes still ehh 1 Query can use multiple indexes Great general purpose NoSQL db, for Processing, filtering during query & data retrieval Processing via index builds, stores in multiple versions. Great at Indexing; Great at searching big datasets
  • 30. www.objectrocket.com Combining – in general • Database >>many indexes or very large indexes • Data has lots of arrays - to perform queries that required many different $and clauses on an field with an array as a value • SPEED up fuzzy and/or full text searches – ‘chicken’ ex. db.articles.find({ $text: { $search: "chi" } }
  • 31. www.objectrocket.com MongoDB & Elasticsearch + Primarily Search Engine Scalable, distributed Horizontal scaling JSON Schemaless* Based on Lucene Support for Python, JS, .Net, Scala, Perl, php, Ruby 3rd Party Product Integration Primarily for Streaming, for moving data between data stores, used with other components and data techs to create near real time and very near real time event analytics, append only, Horizontal scaling JSON Schemaless* Parallel Processing 3rd Party Product Integration Primarily OLTP Scalable, distributed Verticle or Horizontal scaling Binary JSON Schemaless* Rapid prototyping Event Logging Social Media Content management User Data and Actions NOT in-depth analysis MongoDB Elasticsearch Kafka, others
  • 32. www.objectrocket.com MongoDB & Elasticsearch @ObjectRocket MongoDB metrics Centralized Logging MongoDB data visualization Network monitoring Website search Business Metrics Elasticsearch metrics Currently
  • 33. www.objectrocket.com Potential New Use 1 – Bitcoin Time Interval Tracking Bitcoin ticker data Interval Tracking and Analysis…. MongoDB • Simple and Complex Queries • Aggregations at any stage Elasticsearch • Speed up queries – faster results • Store frequent queries for re-use via indexes
  • 34. www.objectrocket.com Potential New Use 1 cont’d – Bitcoin Time Interval Tracking
  • 35. www.objectrocket.com Potential New Use 2 – Cryptocurrency Platform/Trading • Crytpocurrency Trading Platform - ex. tribeca • node.js – v7.8 or higher • MongoDB database – for persistence, aggregations • Elasticsearch – the ‘need for speed’ rapid-fire executions required – sub millisecond trades & cancellations
  • 36. www.objectrocket.com Potential New Use 3 – Social Media App Searching • Searching large Social Media Apps for frequently searched items – popular quarterbacks & receivers on fantasy football sites, wines in comments • MongoDB’s $text operator is special - cannot be used more than once in a query; no use with $nor, etc. ex. db.comments.find({ $and: [{$text: { $search: ”win" },{$text: {$search: “red” }}]}) – WON’T WORK! In MongoDB but combine it.
  • 37. www.objectrocket.com Potential New Use 4 – Machine Learning, Deep Learning
  • 38. www.objectrocket.com Potential New Use 4 – Machine Learning, Deep Learning Architecture and Streaming Platform – Jay Kreps • Apps/DB’s->data in • Aggregations at any stage • Further Queries • Faster Queries via ES • Results back into DB’s • Algorithms applied • Endless … Limitless … Device events, time series, event logs, AR/VR/MR
  • 39. www.objectrocket.com Links • MongoDB to Analyze cryptocurrency price swings and intervals: https://medium.com/@serbanmihai/aggregate-mongodb-data-with-node-js-and-mongoose- cryptocurrency-financial-time-series-ae739b4c9485 • MongoDB with node.js – Cryptocurrency trading platform: https://github.com/michaelgrosner/tribeca • Arctic MongoDN and Python – Cryptocurrency Database: https://mxbu.github.io/logbook/2017/06/04/use-arctic-to-create-cryptocurrency-database/ • AI MI DL - Jay Kreps article Architecture and Streaming Platform for AI Deep Learning Database Pipeline Models Events etc.: • https://www.oreilly.com/ideas/apache-kafka-and-the-four-challenges-of-production-machine- learning-systems
  • 40. www.objectrocket.com We are Hiring! Join a dynamic and innovative team! objectrocket.com/careers
  • 42. www.objectrocket.com Thank You! DeveloperWeek Austin 2017 Kimberly Wilkins Principal Engineer Databases @dba_denizen /wilkinskimberly

Editor's Notes

  1. MongoDB is somewhat the defacto general purpose NoSQL DB and it has added enough new features and made enough improvements to stay there at top of NoSQL offerings Elastic is moving up and it can do things fast As our word expands and changes, the potential use cases for combining data stores – MongoDB and Elasticsearch – also grows. But before we can talk about those current and potential use cases for combining them, we should take a quick look at what each of them are and when to use them individually.
  2. 2 mins
  3. People wanted Big Data to go away, they wanted to call it other things or NOT call it things or whatever… EOT IOT IIOT But it’s not going to…
  4. -Internet of Things / Everything / Industrial IIOT - logs, events, - 2019 ~$1.7 TRILLION $$ -Monitoring and managing those has sprung up whole companies now – -Augmented Reality AR VR MR - THE FUTURE – the next iphone level CHANGE Manufacturing, Training,
  5. Sorry, not sorry - still love this quote after all of the years - But the truth remains – more and more and more Data Points Requires THINGS (applications, Data Store) to manage them
  6. We NEED Something to hold the data, to find the data fast, to SHARE the data and MOVE it from one APP to another Process and transform along the way, Analyze it MEANINGS
  7. NEVER truly schemaless though… If you are NOT thinking about app design before you actually start designing it, you FAIL You are just storing data that will likely never be used and your new shiny NoSQL datastore will just become a data wasteland = MongoDB and Elastic then MONGODB solo next
  8. Keeo them tied together here – MongoDB is somewhat the defacto general purpose NoSQL DB and it has added enough new features and made enough improvements to stay there at top of NoSQL offerings Elastic is moving up and it can do things fast
  9. IF something comes straight from wikipedia it HAS to be true MongoDB is the defacto general purpose NoSQL DB #5 Datastore technology over and holding steady there #1 NoSQL Database product
  10. MongoDB has the market share and the community buy-in to make the difference in supportability to usually take the prize unless you have a really really heavy write application Community Support and Development efforts - drivers, etc. Built in Sharding/Scaling via Replica Sets High writes and heavy reads – can be somewhat mutually exclusive MongoDB scales nearly linearly for heavy read workloads
  11. 3.4.10 as of Halloween - since released on Halloween, would avoid ;-) no tricks please - 3.4.9 considered a minor release overall but … But what does it look like really? Architecture overview next
  12. 1 Primary, 2 Secondaries - heartbeat communication for up/down state, replication to secondaries via oplog MongoDB has same kind of potential to scale UP instead of OUT – **NOTE - many people run MongoDB on dedicated larger bare metal hosts and grow by scaling up vertically However, if they continue to grow, they will run into many of the same challenges that traditional RDBMS's have So what about scaling OUT with Mongo? Religious War here
  13. MongoD’s – the data nodes – the shards - the Replica Sets (primary and 2 secondary members) MongoS’s – Query routers – talk to config servers and MongoD data nodes - get location metadata from config servers to route queries to the correct shard to satisfy a query and return the result Good design to have multiple mongoS query routers in sharded clusters – our environments have 4 Config servers – the Data Dictionary of Mongo - contains cluster/shard metadata – mapping of data set –3.0 and below Always keep exactly and ONLY 3 for PROD env’s. 3.2 and up, is now by default a replica set and is NOW Required to be WT – improves consistency of info in chunk map - aka where data extents reside If you lose or corrupt your configs, the mongoS will not know where the data resides - so can’t retrieve it …so effectively lost
  14. Too much to cover other than mention for you to look up later WT – new default, also for required config serer replica set vs 3 single db’s as before MMAP - still good for larger result sets or smaller, more frequent write activities, specifically updates Unless you have a lot of CPU and cores to throw at it for WT usage = reminder to talk about percona version that allows us to offer security features that usually only come with the more expensive Enterprise version SSL kerberos LDAP integration *** our experience there
  15. User Data in Games Inventory Management – update, decrease, increase inventory Shopping carts - tales of the long query and 1000 pairs of shoes CMS – Our expertise running Sitecore on Azure
  16. A search engine but a whole lot more MUCH more powerful than JUST a search engine GeoAnalytics - Geo near me
  17. Basically Clusters with Nodes holding Indexes then split across hosts with Shards Holding slices of data held in segments at the lucene chunk level Composed of the data via documents written in JSON
  18. There are lots of reasons to use multiple components of the Elastic Stack Including for Visualization which we will talk about a bit later. But 1st let’s talk about just elasticsearch
  19. With Elastic, to increase in scale and add more performance, you increase the Replication Factor Basically ADD NODES -this increases HW resources to improve search performance and improve redundancy The number of replica shards can be changed dynamically on a live cluster, allowing us to scale up or down as demand requires. And Elastic will automatically redistribute as needed nine shards: three primaries and six replicas. This means that we can scale out to a total of nine nodes, again with one shard per node. This would allow us to triple search performance compared to our original three-node cluster.
  20. here Logging and Log Analysis Basically taking over for Splunk which has become too expensive
  21. Elasticsearch has made massive improvements to its geospatial capabilities in the last 2 releases It way outperforms the geospatial abilities of MongoDB’s $geoNear and within operators Which is why you would look to combine them – which we will talk about later on But other good uses of Elasticsearch combined with elements of its Elastic STACK
  22. But other good uses of Elasticsearch combined with elements of its Elastic STACK BUT Now to Summarize those 2 – MongoDB and Elasticsearch
  23. Summarize those 2 Both store data objects that have key-value pair, both allow querying that body of objects. But both come from 2 different camps and are made for different purposes. Elastic - Great with full and fuzzy text searching Slow when adding ‘new’ Data -  aka creating new indexes Uses indexes to help you find the data  - fast Completes complex search queries quickly  Interacts well directly other associated technologies – kibana, beats, logstash, etc. and other NoSQL and SQL DB’s 
  24. In the end, it is about the ability to store data, aggregate things, pass it along. Then ANALYZE and USE that data analysis for whatever purpose you desire So let’s look at these 2 together now
  25. - When your data has a lot of arrays - to perform queries that required many different $and clauses on an field with an array as a value.  MANY Smaller shards as they need additional write scopes 2nd case  - Fuzzy - If you want to do a search on the word chicken in a menu application:
  26. Examples of How we combine MongoDB and Elasticserch CURRENTLY at ObjectRocket
  27. POTENTIAL and or Theoretical New Use Cases Possibilities and Potential Combination uses are very broad – New emerging markets and areas – from cryptocurrency peripherals for persistence to
  28. Use MongoDB to Analyze cryptocurrency price swings and intervals - https://medium.com/@serbanmihai/aggregate-mongodb-data-with-node-js-and-mongoose-cryptocurrency-financial-time-series-ae739b4c9485
  29. node.js (v7.8 or greater) Persistence is achieved using MongoDB tribeca - very low latency cryptocurrency market making trade bot with a full featured web client, backtester, and supports direct connectivity to several crypto coin exchanges  - reacts to market data by placing and canceling orders in under a millisecond
  30. Fantasy Football wine sites -If you want to do a search and possibly a match on the words wine & red db.comments.find( { $and: [ { $text: { $search: "win" },  { $text: { $search: "red" }  }  ] } ) WON’T work $text special MongoDB operator - only use once per query,
  31. Endless opportunities here to combine with other data stores - grab those result sets, store the primary results in MongoDB, perform additional aggregations to further refine them Post online for massive around the world use by colleagues Use Elasticsearch again to keep frequently searched combinations nearby/fast
  32. Endless opportunities here to combine with other data stores - grab those result sets, store the primary results in MongoDB, perform additional aggregations to further refine them Post online for massive around the world use by colleagues Use elasticsearch again to keep frequently searched combinations nearby/fast
  33. Hiring DBA’s and CDE’s