Exploring MongoDB & Elasticsearch: Better Together

www.objectrocket.com
Exploring MongoDB and
Elasticsearch
DeveloperWeek Austin 2017
Kimberly Wilkins
Principal Engineer
Databases
@dba_denizen
/wilkinskimberly

Current Areas of Interest
• NoSQL – MongoDB, Elasticsearch, etc.
• Streaming, real-time analytics
• AR/VR/MR – Augmented, Virtual and
Mixed Reality technologies
• Machine Learning – Deep Learning
• Cryptocurrencies, Blockchain
• Teaching, helping, raising up others

MongoDB &
Elasticsearch
Better Together? Yes!

Overview
• Definitions
• Current versions
• Features
• Architectural basics
• Use cases:
Best, Worst, Together
Squirrel

Why Do It?
The blue data highway… bulging at the seams.

So Many Forms… As Many Impacts
New technologies, new industries, new uses…

Data is Coming From Everywhere
Sensors, IoT

Data is Coming From Everywhere
“Big data is like teenage sex:
everyone talks about it,
nobody really knows how to
do it, everyone thinks
everyone else is doing it, so
everyone claims they are
doing it…”
-Dan Ariely, Duke University

Remember
• Hold the data
• Find the data fast
• Stream the data between data stores
• Process the data along the way
• Analyze the data
• Understand where the data comes from

Why?
• Faster, more flexible development
• Lower $ (hardware, software, deployment)
• Performance (faster writes, faster reads)
• Developers (“Schemaless”, cool toys)
• > dev’s than ^ dba’s, devops, SRE’s…
• Variety of NoSQL technologies

MongoDB
"MongoDB (from humongous) is a free and open-source
cross-platform document-oriented database program.
Classified as a NoSQL database program, MongoDB
uses JSON-like documents with schemas.”
– straight from wikipedia
• #1 NoSQL
• #5 Overall

Features: MongoDB
Document store
collections vs tables; document or objectId’s
Easy for developers – more devs than DBA’s and Ops
flexible data types
Unstructured & structured data
De-normalized
Duplicate data is OK
Index intersections, partials, aggregation pipelines - $lookup
improvements coming in 3.6 *Nov–single db call; updating arrays
Scales vertically or horizontally - sharding

MongoDB Architectural Basics
• Faster, more flexible development
• Built-in Replication via Replica sets
• HA/DR throughout stack, components
• Scaling via Sharding
• DR via use of Multiple Data Centers
• Delayed and/or Hidden Slaves
• https://www.objectrocket.com/files/objectrocket-for-
mongodb-white-paper.pdf

Basic MongoDB Architecture
Primary
Secondary Secondary
Heartbeat
Single Replica Set

Shard 1
Secondary
Secondary
Primary
Shard 2
Secondary
Secondary
Primary
Shard 3
Secondary
Secondary
Primary
Client Drivers
MongoS Tier
(Router)
MongoD Tier Replica Sets
MongoS MongoS MongoS
Config Servers
(Metadata)
Config 3
Config 1
Config 2
Replica Set 3.2
Sharded Cluster
MongoS

MongoDB Architecture - Advanced
• Multiple Storage Engine Options
• HA/DR throughout stack, components
• Scaling via Sharding
• DR via use of Multiple Data Centers,
delayed/hidden
• Percona Server Edition - has features from
MongoDB Enterprise edition* Security

Best Use Cases
• User Data - games, chat, social media
• Mobile Analytics, Engagement/Campaigns
• Aggregation Summaries
• Product Catalogs
• Inventory Management
• Shopping Carts
• Content Management Systems - Sitecore
1000 x

Elasticsearch

Elasticsearch
“Elasticsearch is a distributed, JSON-
based search and analytics engine
designed for horizontal scalability,
maximum reliability, and easy
management.”
– straight from Elastic.co website

Best Use Cases
● Cluster - A collection of Elasticsearch nodes of
various roles
↳ Nodes - Elasticsearch processes that perform one or more roles
● Roles are: master, data, ingest, coordinating-only (client)
● Nodes can operate in any combination or all roles
↳ Indexes - A collection of data (like databases/collections)
● Can be combined in queries with wildcards and aliases
● Fields in an index have an unchangeable data type (mapping)
↳ Shards - Slices of the index data
● Unlike many databases, automatically constructed (not key based)
● A replica is just a readonly copy of a shard
↳ Segments - Lucene’s chunk of data
● Automatically built as data is indexed.
● Docs are not deleted, just marked as deleted (can be
optimized/merged)
↳ Documents - A JSON entry in the index

Elasticsearch vs. Elastic Stack
• Don’t be confused!
• Elasticsearch vs. Elastic Stack
• The Open Source Elastic Stack is a suite of
tools/apps associated with and working in
conjunction with Elasticsearch to complete a variety
of analytics tasks.

Elastic Stack Ecosystem

Basic Elastic Architecture
3 Nodes 1 Replica, 1 master-Master –fewer nodes, more resources
per node, each shard performs better
3 Nodes 2 Replicas, 1 master-Master – more nodes, needs more
HW resources but increases search performance for the index and
improves redundancy

Best Use Cases
• Full and Fuzzy Text Searches **true strength speed
• Geo and Range related searches
• Visualizing Data – with other ES Stack
Components- Kibana
• Logging and Log Analysis xsplunkx
• Scraping and Combining Public Data Sources
• Event and Data Metrics

Geo Queries – Social Media – Near Me

Visualization with Kibana

Visualization with Kibana
MongoDB Elastic (Elasticsearch)
General Purpose Document store DB, server side scripts,
some aggreg pipelines
OLTP = good, REPORTING = not as good
Simple = good, Complex = good, Very Complex = not as good
Full-text search engine, Fuzzy text search, geo near,
keyword, real-time analytics, indexer, distributed , java
based w/Lucene under the covers
Current version: 3.4.10 *Halloween!
Recommended: 3.4.8 or 3.4.9
Current version: 5.6.1 September 18, 2017 *New, kinks from
5.5.3 release from September 11, 2017
Recommended and Available 5.5.1 July 25, 2017
Schemaless **#! Structured, unstructured, semi-structured Schemaless **#! Structured, unstructured, semi-structured
JSON, BSON docs JSON
Sharding to scale Sharding/Nodes to scale
HA via replica sets
(1 Primary, 2 Secondaries – or more with quorum)
HA via replica sets
(1 MASTER, x REPLICAS)
Limited index intersection v2.6+, very large indexes still ehh 1 Query can use multiple indexes
Great general purpose NoSQL db, for Processing, filtering
during query & data retrieval
Processing via index builds, stores in multiple versions.
Great at Indexing; Great at searching big datasets

Now Combine Them
Like tacos
and tequila

Combining – in general
• Database >>many indexes or very large indexes
• Data has lots of arrays - to perform queries that
required many different $and clauses on an field
with an array as a value
• SPEED up fuzzy and/or full text searches – ‘chicken’
ex. db.articles.find({ $text: { $search: "chi" } }

MongoDB & Elasticsearch +
Primarily Search Engine
Scalable, distributed
Horizontal scaling
JSON
Schemaless*
Based on Lucene
Support for Python, JS, .Net,
Scala, Perl, php, Ruby
3rd Party Product Integration
Primarily for Streaming, for
moving data between data
stores, used with other
components and data techs
to create near real time and
very near real time event
analytics, append only,
Horizontal scaling
JSON
Schemaless*
Parallel Processing
3rd Party Product Integration
Primarily OLTP
Scalable, distributed
Verticle or Horizontal
scaling
Binary JSON
Schemaless*
Rapid prototyping
Event Logging
Social Media
Content management
User Data and Actions
NOT in-depth analysis
MongoDB
Elasticsearch
Kafka, others

MongoDB & Elasticsearch @ObjectRocket
MongoDB
metrics
Centralized
Logging
MongoDB data
visualization Network
monitoring
Website search
Business
Metrics
Elasticsearch metrics
Currently

Potential New Use 1 – Bitcoin Time Interval Tracking
Bitcoin ticker data Interval Tracking and Analysis….
MongoDB
• Simple and Complex
Queries
• Aggregations at any
stage
Elasticsearch
• Speed up queries –
faster results
• Store frequent queries
for re-use via indexes

Potential New Use 1 cont’d – Bitcoin Time Interval Tracking

Potential New Use 2 – Cryptocurrency Platform/Trading
• Crytpocurrency Trading Platform - ex. tribeca
• node.js – v7.8 or higher
• MongoDB database – for persistence, aggregations
• Elasticsearch – the ‘need for speed’ rapid-fire
executions required – sub millisecond trades & cancellations

Potential New Use 3 – Social Media App Searching
• Searching large Social Media Apps for frequently
searched items – popular quarterbacks & receivers
on fantasy football sites, wines in comments
• MongoDB’s $text operator is special - cannot be
used more than once in a query; no use with $nor,
etc.
ex. db.comments.find({ $and: [{$text: { $search: ”win"
},{$text: {$search: “red” }}]}) – WON’T WORK!
In MongoDB but combine it.

Potential New Use 4 – Machine Learning, Deep Learning

Potential New Use 4 – Machine Learning, Deep Learning
Architecture and Streaming
Platform – Jay Kreps
• Apps/DB’s->data in
• Aggregations at any stage
• Further Queries
• Faster Queries via ES
• Results back into DB’s
• Algorithms applied
• Endless … Limitless …
Device events, time series,
event logs, AR/VR/MR

Links
• MongoDB to Analyze cryptocurrency price swings and intervals:
https://medium.com/@serbanmihai/aggregate-mongodb-data-with-node-js-and-mongoose-
cryptocurrency-financial-time-series-ae739b4c9485
• MongoDB with node.js – Cryptocurrency trading platform:
https://github.com/michaelgrosner/tribeca
• Arctic MongoDN and Python – Cryptocurrency Database:
https://mxbu.github.io/logbook/2017/06/04/use-arctic-to-create-cryptocurrency-database/
• AI MI DL - Jay Kreps article Architecture and Streaming Platform for AI Deep Learning
Database Pipeline Models Events etc.:
• https://www.oreilly.com/ideas/apache-kafka-and-the-four-challenges-of-production-machine-
learning-systems

We are Hiring!
Join a dynamic and
innovative team!
objectrocket.com/careers

Consultations Available
sales@objectrocket.com
objectrocket.com/customers/
View Customer Stories
Trial & Migrations
always free
objectrocket.com

Thank You!
DeveloperWeek Austin 2017
Kimberly Wilkins
Principal Engineer
Databases
@dba_denizen
/wilkinskimberly

Exploring MongoDB & Elasticsearch: Better Together

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Exploring MongoDB & Elasticsearch: Better Together

Similar to Exploring MongoDB & Elasticsearch: Better Together (20)

Recently uploaded

Recently uploaded (20)

Exploring MongoDB & Elasticsearch: Better Together

Editor's Notes