SlideShare una empresa de Scribd logo
1 de 37
ElevateYour Enterprise
Architecture with an In-Memory
Computing Strategy
Dylan Tong
Principal Solutions Architect
dylan.tong@mongodb.com
In-Memory Computing
How can we process data as fast as possible
by leveraging in-memory speed at it’s best?
What are the possibilities if we could?
High-frequency trading (HFT) is a program trading platform that uses
powerful computers to transact a large number of orders at very fast
speeds. It uses complex algorithms to analyze multiple markets and
execute orders based on market conditions.
Typically, the traders with the fastest execution speeds are more
profitable than traders with slower execution speeds.
Source: Investopedia
Speed Matters…
Speed Matters…
Amazon found that it increased revenue by 1% for every 100ms of
improvement [source: Amazon]
A 1-second delay in page load time equals 11% fewer page views,
a 16% decrease in customer satisfaction, and 7% loss in
conversions. [Source: Aberdeen Group]
A study found that 27% of the participants who did mobile shopping
were dissatisfied due to the experience being too slow. [Source:
Forrester Consulting]
How Fast?
Latency Unit
RAM access 100s ns
SSD access 100s µs
HDD access 10s ms
Normalized to 1 s
~6 min
~6 days
~12 months
Why Now?
*Average $/GB
2015 $4.37
2013 $5.5
2010 $12.37
2005 $189
2000 $1,107
1995 $30,875
1990 $103,880
1985 $859,375
1980 $6,328,125
$0
$20
$40
$60
$80
$100
$120
$140
$160
$180
$200
2005 2010 2013 2015
Last 10 Years…
“Generally affordable”
*http://www.statisticbrain.com/average-historic-price-of-ram/
Why Now?
$0.00
$2.00
$4.00
$6.00
$8.00
$10.00
$12.00
$14.00
2010 2013 2015
“An Option at Scale”
*Average $/GB
2015 $4.37
2013 $5.5
2010 $12.37
2005 $189
2000 $1,107
1995 $30,875
1990 $103,880
1985 $859,375
1980 $6,328,125
Last 5 Years…
*http://www.statisticbrain.com/average-historic-price-of-ram/
"This will process these data using algorithms for machine
learning and artificial intelligence before sending the data
back to the car.
The zFAS board will in this way continuously extend its
capabilities to master even complex situations increasingly
better," Audi stated. "The piloted cars from Audi thus learn
more every day and with each new situation they
experience.”
Source: T3.com
The possibilities…
Challenges: Scale
Challenges: Cost Viability
= $34,777/yr.  ~$1.74M/yr. for infrastructure to support 100TB
Challenges: Cost Viability
Storage Type Avg. Cost ($/GB) Cost at 100TB ($)
RAM 5.00 500K
SSD 0.47-1.00 47K to 100K
HDD 0.03 3K
http://www.statisticbrain.com/average-cost-of-hard-drive-storage/
http://www.myce.com/news/ssd-price-per-gb-drops-below-0-50-how-low-can-they-go-70703/
Challenges: Durability
Volatile Memory
• What happens when things fail,
and what data maybe loss?
• How does the system synchronize
with your durable storage? Does it
do this well, and is it simple to
implement?
Challenges: Design Still Matters
on RAM
Scenario : ECommerce Modernization
Initiative
Business Problems Technology Limitation
Customer experience is suffering during high traffic
events.
Too expensive to scale system to support spike
events.
Scaling system is hard, and engineering teams
can’t react fast enough in the event of unexpected
growth
Some caching solution implemented, but it mostly
only helps with read performance; synchronizing
writes has been a development nightmare.
Lack of mobile customers in Europe and Asia has
been attributed to latency issues.
Difficult to extend data architecture globally, so
effort is put on hold
Scenario : ECommerce Modernization
Initiative
Business Problems Technology Limitation
Below industry conversation rate performance
has been attributed partly to poor personalization
Customer info is siloed across across the
Enterprise, and it’s too complicated to bring this
data together so effective models can be built to
drive personalization
“Big Data” project to bring data together to drive
machine learning and cognitive capabilities in
platform failed as data scientists report platform
was too slow to develop on, and performance
was impractical.
Business analysts have siloed views of the
eCommerce channel, and information isn’t
getting to them fast enough
Related to limitations above
Integrating data into data warehouse is slow and
hard to maintain
Orders
Product
Catalog
Customer Data:
Profile, Sessions,
Carts, Personalization
Inventory
NoSQLRDBMS
Platform Services
eCommerce Datastores Dependent External Data Sources and Integrations
CRM ERP PIM
Data warehouse
BI Tools
…
Platform API
Scenario : ECommerce Modernization
Initiative
Customer Data:
Profile, Sessions,
Carts, Personalization
NoSQLRDBMS CRM ERP PIM
Partner Sources: Supplier
databases…etc.
Legacy:
Mainframe
Product
Catalog
Silo Data-sources Problem
SLOW AND POOR SCALABILITY
NoSQLRDBMS CRM ERP PIM
Partner Sources: Supplier
databases…etc.
Legacy:
Mainframe
Operational Single View
Operational Single View
Customer Data:
Profile, Sessions,
Carts, Personalization
Product
Catalog
Operational Single View
MongoDB
Enterprise Data Hub
Operational Single View
Reference: Metlife Wall Presentation
{
product_name: ‘Acme Paint’,
color: [‘Red’, ‘Green’],
size_oz: [8, 32],
finish: [‘satin’, ‘eggshell’]
}
{
product_name: ‘T-shirt’,
size: [‘S’, ‘M’, ‘L’, ‘XL’],
color: [‘Heather Gray’ … ],
material: ‘100% cotton’,
wash: ‘cold’,
dry: ‘tumble dry low’
}
{
product_name: ‘Mountain Bike’,
brake_style: ‘mechanical disc’,
color: ‘grey’,
frame_material: ‘aluminum’,
no_speeds: 21,
package_height: ‘7.5x32.9x55’,
weight_lbs: 44.05,
suspension_type: ‘dual’,
wheel_size_in: 26
}
Documents in the same product catalog collection in MongoDB
Dynamic Schema
Flexible Data Model: facilitates
agile development and continuous
delivery methodologies
Scalability: scale-out dynamically
as demand grows
Still Agile, Scalable and Simple
High Performance:
• More predictable, and lower
latency on less in-memory
infrastructure.
In-Memory Storage Engine
Infrastructure Optimization:
• Assign a data subset on the
In-Memory SE via Zone
Sharding.
• Optimize on cost vs.
performance without silos.
.Rich Query Capability:
• Full MongoDB Query and
Indexing Support.
IN-MEMORY SE NODES WIREDTIGER NODES
WEST EAST
Update
SHARD 4
TAG: EAST, WT
Local Read/Write with Strong Consistency
Session Data Geographically Localized, and with In-memory Engine Latency
SHARD 2
TAG: WEST, WT
SHARD 3
TAG: EAST, IN_MEM
SHARD 1
TAG: WEST, IN_MEM
Durability and Fault-Tolerance:
• Mixed ReplicaSets allow data to
be replicated from In-Memory SE
to WT SE.
• Full High Availability: automatic
fail-over, cross geography.
In-Memory Storage Engine
NoSQLRDBMS
Platform Databases Dependent External Data Sources and Integrations
CRM ERP PIM
Partner Sources: Supplier
databases…etc.
Legacy:
Mainframe
Operational Unified View
Advance Personalization
1. TRAIN/RE-TRAIN
ML MODELS
2. APPLY MODELS TO
REAL-TIME
STREAM OF
INTERACTIONS
3. DRIVE TARGETED
CONTENT,
RECOMMENDATIONS…ET
C.
Why ?
Speed. By exploiting in-memory optimizations, Spark
has shown up to 100x higher performance than
MapReduce running on Hadoop.
Simplicity. Easy-to-use APIs for operating on large
datasets. This includes a collection of sophisticated
operators for transforming and manipulating
semi-structured data.
Unified Framework. Packaged with higher-level libraries,
including support for SQL queries, machine learning,
stream and graph processing. These standard libraries
increase developer productivity and can be combined to
create complex workflows.
Operational Single View
+Spark Connector
• Native Scala connector,
certified by Databricks
• Exposes all Spark APIs &
libraries
• Efficient data filtering
with predicate
pushdown, secondary
indexes, & in-database
aggregations
• Locality awareness to
reduce data movement
Locality Awareness
CLUSTER
MANAGER
Task
Task
Task
Task
Task
DRIVER
PROGRAM
SPARK
CONTEXT
Operational Single View
+Spark Connector
Blend client data from multiple
internal and external sources to
drive real time campaign
optimization
MongoDB+Spark at China Eastern
180m fare calculations & 1.6
billion searches per day
Oracle database peaked at 200
searches per second.
Radically re-architect their fare
engine to meet the required
100x growth in search traffic.
ETL
(Yesterday’s) Data at the Speed of Thought?
BI Connector
BI Connector
db.orders.aggregate( [
{
$group: {
_id: null,
total: { $sum:
"$price" }
}
}
] )
SELECT SUM(price)
AS total
FROM orders
Resources for You
Spark Connector
• Download: Spark Packages
GitHub
• Documentation
• Whitepaper:
Turning Analytics into Real-Time
Action
• Education:M233: Getting
Started with Spark and
MongoDB
In-Memory Storage Engine
• Download: Enterprise Server
• Documentation
BI Connector
• Download: BI Connector
• Documentation
Dylan Tong
Principal Solutions Architect
dylan.tong@mongodb.com
Q&A

Más contenido relacionado

La actualidad más candente

Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessMongoDB
 
Aerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower ManhattanAerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower ManhattanAerospike
 
Architecting Big Data Applications with HDInsight
Architecting Big Data Applications with HDInsightArchitecting Big Data Applications with HDInsight
Architecting Big Data Applications with HDInsightAshish Thapliyal
 
A Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen DonigianA Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen DonigianData Con LA
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing DataWorks Summit
 
NoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
NoSQL on MySQL - MySQL Document Store by Vadim TkachenkoNoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
NoSQL on MySQL - MySQL Document Store by Vadim TkachenkoData Con LA
 
Developing Software for Persistent Memory / Willhalm Thomas (Intel)
Developing Software for Persistent Memory / Willhalm Thomas (Intel)Developing Software for Persistent Memory / Willhalm Thomas (Intel)
Developing Software for Persistent Memory / Willhalm Thomas (Intel)Ontico
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarMS Cloud Summit
 
MongoDB in a Mainframe World
MongoDB in a Mainframe WorldMongoDB in a Mainframe World
MongoDB in a Mainframe WorldMongoDB
 
Aerospike: Enabling Your Digital Transformation
Aerospike: Enabling Your Digital TransformationAerospike: Enabling Your Digital Transformation
Aerospike: Enabling Your Digital TransformationBrillix
 
Building an analytical platform
Building an analytical platformBuilding an analytical platform
Building an analytical platformDavid Walker
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?Slim Baltagi
 
Gartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systemsGartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systemsparamitap
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
A Comparison of EDB Postgres to Self-Supported PostgreSQL
A Comparison of EDB Postgres to Self-Supported PostgreSQLA Comparison of EDB Postgres to Self-Supported PostgreSQL
A Comparison of EDB Postgres to Self-Supported PostgreSQLEDB
 
Delivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analyticsDelivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analyticsMariaDB plc
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform DATAVERSITY
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMariaDB plc
 
Welcome | MariaDB today and our vision for the future
Welcome | MariaDB today and our vision for the futureWelcome | MariaDB today and our vision for the future
Welcome | MariaDB today and our vision for the futureMariaDB plc
 

La actualidad más candente (20)

Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your Business
 
Aerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower ManhattanAerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower Manhattan
 
Architecting Big Data Applications with HDInsight
Architecting Big Data Applications with HDInsightArchitecting Big Data Applications with HDInsight
Architecting Big Data Applications with HDInsight
 
A Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen DonigianA Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen Donigian
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
NoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
NoSQL on MySQL - MySQL Document Store by Vadim TkachenkoNoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
NoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
 
Developing Software for Persistent Memory / Willhalm Thomas (Intel)
Developing Software for Persistent Memory / Willhalm Thomas (Intel)Developing Software for Persistent Memory / Willhalm Thomas (Intel)
Developing Software for Persistent Memory / Willhalm Thomas (Intel)
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan Kumar
 
MongoDB in a Mainframe World
MongoDB in a Mainframe WorldMongoDB in a Mainframe World
MongoDB in a Mainframe World
 
Aerospike: Enabling Your Digital Transformation
Aerospike: Enabling Your Digital TransformationAerospike: Enabling Your Digital Transformation
Aerospike: Enabling Your Digital Transformation
 
Building an analytical platform
Building an analytical platformBuilding an analytical platform
Building an analytical platform
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
Gartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systemsGartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systems
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
A Comparison of EDB Postgres to Self-Supported PostgreSQL
A Comparison of EDB Postgres to Self-Supported PostgreSQLA Comparison of EDB Postgres to Self-Supported PostgreSQL
A Comparison of EDB Postgres to Self-Supported PostgreSQL
 
Delivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analyticsDelivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analytics
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform 
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimization
 
Welcome | MariaDB today and our vision for the future
Welcome | MariaDB today and our vision for the futureWelcome | MariaDB today and our vision for the future
Welcome | MariaDB today and our vision for the future
 

Destacado

Monitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildMonitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildTim Vaillancourt
 
Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2MongoDB
 
Lean UX and Ecommerce Design: How Ai is transforming the insurance industry w...
Lean UX and Ecommerce Design: How Ai is transforming the insurance industry w...Lean UX and Ecommerce Design: How Ai is transforming the insurance industry w...
Lean UX and Ecommerce Design: How Ai is transforming the insurance industry w...Alex Schmelkin
 
MongoDB Days UK: Scaling MongoDB with Docker and cgroups
MongoDB Days UK: Scaling MongoDB with Docker and cgroupsMongoDB Days UK: Scaling MongoDB with Docker and cgroups
MongoDB Days UK: Scaling MongoDB with Docker and cgroupsMongoDB
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseMongoDB
 
Securing Microservices using Play and Akka HTTP
Securing Microservices using Play and Akka HTTPSecuring Microservices using Play and Akka HTTP
Securing Microservices using Play and Akka HTTPRafal Gancarz
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs RedisItamar Haber
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 

Destacado (8)

Monitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildMonitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the Wild
 
Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2
 
Lean UX and Ecommerce Design: How Ai is transforming the insurance industry w...
Lean UX and Ecommerce Design: How Ai is transforming the insurance industry w...Lean UX and Ecommerce Design: How Ai is transforming the insurance industry w...
Lean UX and Ecommerce Design: How Ai is transforming the insurance industry w...
 
MongoDB Days UK: Scaling MongoDB with Docker and cgroups
MongoDB Days UK: Scaling MongoDB with Docker and cgroupsMongoDB Days UK: Scaling MongoDB with Docker and cgroups
MongoDB Days UK: Scaling MongoDB with Docker and cgroups
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick Database
 
Securing Microservices using Play and Akka HTTP
Securing Microservices using Play and Akka HTTPSecuring Microservices using Play and Akka HTTP
Securing Microservices using Play and Akka HTTP
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs Redis
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 

Similar a MongoDB and In-Memory Computing

Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory ComputingWebinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory ComputingMongoDB
 
Oracle Coherence: in-memory datagrid
Oracle Coherence: in-memory datagridOracle Coherence: in-memory datagrid
Oracle Coherence: in-memory datagridEmiliano Pecis
 
Rob Callaghan_OOW14 IO Performance for Database
Rob Callaghan_OOW14 IO Performance for DatabaseRob Callaghan_OOW14 IO Performance for Database
Rob Callaghan_OOW14 IO Performance for DatabaseRob Callaghan
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Prolifics
 
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBLa creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBMongoDB
 
The Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseThe Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseAltibase
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsDirecti Group
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Denodo
 
Oracle Database 11g Lower Your Costs
Oracle Database 11g Lower Your CostsOracle Database 11g Lower Your Costs
Oracle Database 11g Lower Your CostsMark Rabne
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading StrategiesMongoDB
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBMongoDB
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityDevOps.com
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Coursejimliddle
 
Arquitectura de Datos en Azure
Arquitectura de Datos en AzureArquitectura de Datos en Azure
Arquitectura de Datos en AzureElena Lopez
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Amazon Web Services
 

Similar a MongoDB and In-Memory Computing (20)

Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory ComputingWebinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
 
Oracle Coherence: in-memory datagrid
Oracle Coherence: in-memory datagridOracle Coherence: in-memory datagrid
Oracle Coherence: in-memory datagrid
 
Rob Callaghan_OOW14 IO Performance for Database
Rob Callaghan_OOW14 IO Performance for DatabaseRob Callaghan_OOW14 IO Performance for Database
Rob Callaghan_OOW14 IO Performance for Database
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
 
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBLa creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDB
 
The Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseThe Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- Altibase
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Oracle Database 11g Lower Your Costs
Oracle Database 11g Lower Your CostsOracle Database 11g Lower Your Costs
Oracle Database 11g Lower Your Costs
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDB
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application Quality
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
 
Vectorization whitepaper
Vectorization whitepaperVectorization whitepaper
Vectorization whitepaper
 
Arquitectura de Datos en Azure
Arquitectura de Datos en AzureArquitectura de Datos en Azure
Arquitectura de Datos en Azure
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 

Último

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Último (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

MongoDB and In-Memory Computing

  • 1. ElevateYour Enterprise Architecture with an In-Memory Computing Strategy Dylan Tong Principal Solutions Architect dylan.tong@mongodb.com
  • 2. In-Memory Computing How can we process data as fast as possible by leveraging in-memory speed at it’s best? What are the possibilities if we could?
  • 3. High-frequency trading (HFT) is a program trading platform that uses powerful computers to transact a large number of orders at very fast speeds. It uses complex algorithms to analyze multiple markets and execute orders based on market conditions. Typically, the traders with the fastest execution speeds are more profitable than traders with slower execution speeds. Source: Investopedia Speed Matters…
  • 4. Speed Matters… Amazon found that it increased revenue by 1% for every 100ms of improvement [source: Amazon] A 1-second delay in page load time equals 11% fewer page views, a 16% decrease in customer satisfaction, and 7% loss in conversions. [Source: Aberdeen Group] A study found that 27% of the participants who did mobile shopping were dissatisfied due to the experience being too slow. [Source: Forrester Consulting]
  • 5. How Fast? Latency Unit RAM access 100s ns SSD access 100s µs HDD access 10s ms Normalized to 1 s ~6 min ~6 days ~12 months
  • 6. Why Now? *Average $/GB 2015 $4.37 2013 $5.5 2010 $12.37 2005 $189 2000 $1,107 1995 $30,875 1990 $103,880 1985 $859,375 1980 $6,328,125 $0 $20 $40 $60 $80 $100 $120 $140 $160 $180 $200 2005 2010 2013 2015 Last 10 Years… “Generally affordable” *http://www.statisticbrain.com/average-historic-price-of-ram/
  • 7. Why Now? $0.00 $2.00 $4.00 $6.00 $8.00 $10.00 $12.00 $14.00 2010 2013 2015 “An Option at Scale” *Average $/GB 2015 $4.37 2013 $5.5 2010 $12.37 2005 $189 2000 $1,107 1995 $30,875 1990 $103,880 1985 $859,375 1980 $6,328,125 Last 5 Years… *http://www.statisticbrain.com/average-historic-price-of-ram/
  • 8. "This will process these data using algorithms for machine learning and artificial intelligence before sending the data back to the car. The zFAS board will in this way continuously extend its capabilities to master even complex situations increasingly better," Audi stated. "The piloted cars from Audi thus learn more every day and with each new situation they experience.” Source: T3.com The possibilities…
  • 10. Challenges: Cost Viability = $34,777/yr.  ~$1.74M/yr. for infrastructure to support 100TB
  • 11. Challenges: Cost Viability Storage Type Avg. Cost ($/GB) Cost at 100TB ($) RAM 5.00 500K SSD 0.47-1.00 47K to 100K HDD 0.03 3K http://www.statisticbrain.com/average-cost-of-hard-drive-storage/ http://www.myce.com/news/ssd-price-per-gb-drops-below-0-50-how-low-can-they-go-70703/
  • 12. Challenges: Durability Volatile Memory • What happens when things fail, and what data maybe loss? • How does the system synchronize with your durable storage? Does it do this well, and is it simple to implement?
  • 15. Scenario : ECommerce Modernization Initiative Business Problems Technology Limitation Customer experience is suffering during high traffic events. Too expensive to scale system to support spike events. Scaling system is hard, and engineering teams can’t react fast enough in the event of unexpected growth Some caching solution implemented, but it mostly only helps with read performance; synchronizing writes has been a development nightmare. Lack of mobile customers in Europe and Asia has been attributed to latency issues. Difficult to extend data architecture globally, so effort is put on hold
  • 16. Scenario : ECommerce Modernization Initiative Business Problems Technology Limitation Below industry conversation rate performance has been attributed partly to poor personalization Customer info is siloed across across the Enterprise, and it’s too complicated to bring this data together so effective models can be built to drive personalization “Big Data” project to bring data together to drive machine learning and cognitive capabilities in platform failed as data scientists report platform was too slow to develop on, and performance was impractical. Business analysts have siloed views of the eCommerce channel, and information isn’t getting to them fast enough Related to limitations above Integrating data into data warehouse is slow and hard to maintain
  • 17. Orders Product Catalog Customer Data: Profile, Sessions, Carts, Personalization Inventory NoSQLRDBMS Platform Services eCommerce Datastores Dependent External Data Sources and Integrations CRM ERP PIM Data warehouse BI Tools … Platform API Scenario : ECommerce Modernization Initiative
  • 18. Customer Data: Profile, Sessions, Carts, Personalization NoSQLRDBMS CRM ERP PIM Partner Sources: Supplier databases…etc. Legacy: Mainframe Product Catalog Silo Data-sources Problem SLOW AND POOR SCALABILITY
  • 19. NoSQLRDBMS CRM ERP PIM Partner Sources: Supplier databases…etc. Legacy: Mainframe Operational Single View Operational Single View Customer Data: Profile, Sessions, Carts, Personalization Product Catalog
  • 20. Operational Single View MongoDB Enterprise Data Hub Operational Single View
  • 21. Reference: Metlife Wall Presentation
  • 22. { product_name: ‘Acme Paint’, color: [‘Red’, ‘Green’], size_oz: [8, 32], finish: [‘satin’, ‘eggshell’] } { product_name: ‘T-shirt’, size: [‘S’, ‘M’, ‘L’, ‘XL’], color: [‘Heather Gray’ … ], material: ‘100% cotton’, wash: ‘cold’, dry: ‘tumble dry low’ } { product_name: ‘Mountain Bike’, brake_style: ‘mechanical disc’, color: ‘grey’, frame_material: ‘aluminum’, no_speeds: 21, package_height: ‘7.5x32.9x55’, weight_lbs: 44.05, suspension_type: ‘dual’, wheel_size_in: 26 } Documents in the same product catalog collection in MongoDB Dynamic Schema
  • 23. Flexible Data Model: facilitates agile development and continuous delivery methodologies Scalability: scale-out dynamically as demand grows Still Agile, Scalable and Simple
  • 24. High Performance: • More predictable, and lower latency on less in-memory infrastructure. In-Memory Storage Engine Infrastructure Optimization: • Assign a data subset on the In-Memory SE via Zone Sharding. • Optimize on cost vs. performance without silos. .Rich Query Capability: • Full MongoDB Query and Indexing Support. IN-MEMORY SE NODES WIREDTIGER NODES
  • 25. WEST EAST Update SHARD 4 TAG: EAST, WT Local Read/Write with Strong Consistency Session Data Geographically Localized, and with In-memory Engine Latency SHARD 2 TAG: WEST, WT SHARD 3 TAG: EAST, IN_MEM SHARD 1 TAG: WEST, IN_MEM
  • 26. Durability and Fault-Tolerance: • Mixed ReplicaSets allow data to be replicated from In-Memory SE to WT SE. • Full High Availability: automatic fail-over, cross geography. In-Memory Storage Engine
  • 27. NoSQLRDBMS Platform Databases Dependent External Data Sources and Integrations CRM ERP PIM Partner Sources: Supplier databases…etc. Legacy: Mainframe Operational Unified View Advance Personalization 1. TRAIN/RE-TRAIN ML MODELS 2. APPLY MODELS TO REAL-TIME STREAM OF INTERACTIONS 3. DRIVE TARGETED CONTENT, RECOMMENDATIONS…ET C.
  • 28. Why ? Speed. By exploiting in-memory optimizations, Spark has shown up to 100x higher performance than MapReduce running on Hadoop. Simplicity. Easy-to-use APIs for operating on large datasets. This includes a collection of sophisticated operators for transforming and manipulating semi-structured data. Unified Framework. Packaged with higher-level libraries, including support for SQL queries, machine learning, stream and graph processing. These standard libraries increase developer productivity and can be combined to create complex workflows.
  • 29. Operational Single View +Spark Connector • Native Scala connector, certified by Databricks • Exposes all Spark APIs & libraries • Efficient data filtering with predicate pushdown, secondary indexes, & in-database aggregations • Locality awareness to reduce data movement
  • 31. Operational Single View +Spark Connector Blend client data from multiple internal and external sources to drive real time campaign optimization
  • 32. MongoDB+Spark at China Eastern 180m fare calculations & 1.6 billion searches per day Oracle database peaked at 200 searches per second. Radically re-architect their fare engine to meet the required 100x growth in search traffic.
  • 33. ETL (Yesterday’s) Data at the Speed of Thought?
  • 34. BI Connector BI Connector db.orders.aggregate( [ { $group: { _id: null, total: { $sum: "$price" } } } ] ) SELECT SUM(price) AS total FROM orders
  • 35. Resources for You Spark Connector • Download: Spark Packages GitHub • Documentation • Whitepaper: Turning Analytics into Real-Time Action • Education:M233: Getting Started with Spark and MongoDB In-Memory Storage Engine • Download: Enterprise Server • Documentation BI Connector • Download: BI Connector • Documentation
  • 36.
  • 37. Dylan Tong Principal Solutions Architect dylan.tong@mongodb.com Q&A

Notas del editor

  1. Put simply, there are two big questions that I think define and drive in-memory computing: How can we process data s fast as possible by leveraging in-memory speed at it’s best? Secondly, what are the possibilities if we could?
  2. Why do we care about speed? It matters in a lot of cases… In the Financial world, it matters in areas like High Frequency trading, which is estimated to account for 50-70% of trades in the past 5 years. HFT platforms transact a large number of orders at very fast speeds, and often use complex algorithms to analyze multiple markets and market conditions Typically, the traders with the fastest execution speeds are more profitable than traders with slower execution speeds.
  3. Research by Enterprises and Analysts correlating performance, online experiences and revenue are well documented. I list a few here from some Analysts and Amazon, but there are other public studies from Google and Walmart demonstrating the same Well known study by Aberdeen Group discovered: A 1-second delay in page load time equals 11% fewer page views, a 16% decrease in customer satisfaction, and 7% loss in conversions translated to dollars, if your business earn just $100,000 a day, this equates to $2.5M in potential sales annually. – faster is better. Slow online experiences translate to lost opportunities and we as users and consumers can relate.
  4. So, how fast is in-memory? Here’s the rough units that best measure data access times across different storage mediums. Click If we normalize to 1s, it is clear that the magnitude in speed is drastic between RAM and even fast SSD storage.
  5. Some may already be nodding their heads… RAM isn’t new technology, and we’re aware that the price of RAM has dropped drastically over the decade. By 2010, the sharp decline in average cost has made RAM “generally affordable” for mainstream use; however, it is far from cheap especially when we consider the data volumes that we work with today.
  6. However, prices continue to fall, and an average price of $4.37 in 2015 make RAM an option even at scale for greenfield projects that need the speed.
  7. IOT is certainly not a space short of innovation and possibilities, and the ability to scale in-memory performance only makes possibilities more exciting. I came across an article where Audi is discussing their plans for their connected self-driving car, and their intentions to send data collected from various sensors on the car back to the cloud where they will leverage ML to process data to send back to the car so that it can learn and better adapt to complex situations. “…machine learning it will mean adverse weather conditions, such as snow, which can affect sensors will be less of a problem as cars will have a thorough understanding of the piece of tarmac it is traversing” Consider the future, the scale of every vehicle on the road, the amount of data collected that needs to be processed. In-memory computing solutions will be needed to process big data fast especially in the world of smart cars where information will drive important decisions in real-time.
  8. Despite the significant increase in the amount of RAM you could put on a single server in the past couple of years, there are still limits, and the data volumes that we work with today continue to grow due to the type of applications we build, and the type of data sources we analyze and data mine. For many organizations, the bulk of workloads are being moved to or are in the cloud, and the ability to scale on cloud infrastructure is critical. The ability to scale-out to fit large data-sets in RAM across servers is critical. If not, data volume, then compute to support large scale services in the cloud.
  9. We previously discuss how cost has lowered dramatically, and while it is an option at scale, it can still be cost prohibitive for certain projects. Consider AWS’s X1 instance. Impressively provides nearly 2TB of RAM, but at a hefty price. At a scale of 100TBs, $1.74M just for infrastructure isn’t an option for certain projects. Question is, does the problem really require to have all your data in RAM?
  10. While memory is magnitudes faster than other storage mediums, the difference in relative cost is also significant. With that said, in-memory solutions shouldn’t be designed around needing your Enterprise data-architecture or even application to run entirely in-memory. The value of the data and the problem you’re solving should dictate what is the right medium, and an in-memory solution should seamless integrate into a Enterprise Data Architecture that supports all storage mediums.
  11. Generally, when we talk about memory we refer to what is readily available-- volatile memory; if you server goes down, then the data stored in that server’s RAM is lost unless it has also been put on durable storage like disk. Trading off data-loss for speed, in most use cases, isn’t acceptable. A good in-memory solution needs to provide fault tolerance, and it needs to synchronize with durable storage, and just as importantly, simply and reliably (which often isn’t the case for some solutions like external distributed caches).
  12. As fast as RAM is, it doesn’t remedy bad design. More importantly, any in-memory computing technology shouldn’t introduce new bottlenecks into the architecture, or limit your data architecture to addressing the biggest performance bottlenecks in your system. For instance: Does your in-memory computing solution require you to move large volumes of data around? If so, is that creating bottlenecks in other ways? How does your solution bring data into RAM? Is there an efficient caching algorithm, and is relevant data selected and filtered efficiently? How is your data being processed in RAM? Is there an efficient algorithm? Is it introducing inefficiencies and new performance bottlenecks by shuffling data unnecessarily across a distributed system?
  13. So know that we understand the challenges and core requirements around introducing in-memory technologies into your Enterprise Data Architecture, let’s understand how MongoDB fits into the big picture and what it can offer in this area.
  14. Let’s hone in on the product catalog and customer session management parts of the system as the problem is most clear. Customer session management component is key to driving customer experience like personalization, and effective personalization needs to be based on full picture of the customer – realistically, in an Enterprise, customer touch points and information is siloed across many systems, and rarely is there one place in an Enterprise where an operational system can get everything it needs to know about the customer. Likewise, with the Product Catalog, information about products will be siloed. Perhaps some info is stored within the ecommerce platform, but likely has to be synchronized with external systems like PIMs, and Supplier systems. Additionally, a modern platform should also be able to keep availability up to date as part of the product search, so problems aren’t caused downstream around order fulfillment. Finally, the business analysts will also need to analyze the same data sources. Consolidating these systems isn’t realistic Integration is necessary, and ideally it shouldn’t involve heavy redundancy; for instance, across operational and BI environments. Federated data access of these systems isn’t an option on many fronts due to performance and scale. Sufficient integration of data into the DW via traditional ETL is a huge effort and likely too slow to make happen.
  15. This component would be well served by MongoDB, and in fact, is one of the most common use cases for MongoDB.
  16. This component would be well served by MongoDB, and in fact, is one of the most common use cases for MongoDB.