tecFinal 451 webinar deck

NoSQL: The Challenges
Beyond Multi-Model and
Integrating into Big Data
Applications

Presenters:
Matthew Aslett,
Research Director, 451 Research
• NoSQL Beyond Polyglot
Persistence
Peter Coppola, VP Product &
Marketing, Basho Technologies
• How a Data Platform solves the
challenges of integrating NoSQL
into Big Data applications

NoSQL: Beyond polyglot persistence
Matthew Aslett, research director

451 Research is an information
technology research & advisory company
Founded in 2000
210+ employees, including over 100 analysts
1,000+ clients: Technology & Service providers, corporate
advisory, finance, professional services, and IT decision makers
12,500+ senior IT professionals in our research community
Over 52 million data points each quarter
4,500+ reports published each year covering 2,000+
innovative technology & service providers
Headquartered in New York City with offices in London,
Boston, San Francisco, and Washington D.C.
451 Research and its sister company Uptime Institute
comprise the two divisions of The 451 Group
Research & Data
Advisory Services
Events
4
Copyright (C) 2015 451 Research LLC

The birth of NoSQL
• The genesis of much – although by no means all – of the momentum behind
the NoSQL database movement can be attributed to two research papers:
• Google’s BigTable: A Distributed Storage System for Structured Data, presented at the
Seventh Symposium on Operating System Design and Implementation, in November
2006
• Amazon’s Dynamo: Amazon’s Highly Available Key-Value Store, presented at the 21st
ACM Symposium on Operating Systems Principles, in October 2007
• The term itself was coined by Johan Oskarsson as the name for a June 2009
meeting of developers, users and others interested in a group of loosely
related data technologies
5

SPRAINED RELATIONAL DATABASES
Photo credit: Foxtongue on Flickr http://www.flickr.com/photos/foxtongue/4844016087/

 The traditional relational database has been stretched beyond its normal
capacity by the needs of high-volume, highly distributed or highly complex
applications.
 Scalability
 Performance
 Relaxed consistency Increased willingness to look towards
 Agility emerging alternatives
 Intricacy
 Necessity
Database SPRAIN
7

 The traditional relational database has been stretched beyond its normal
capacity by the needs of high-volume, highly distributed or highly complex
applications.
 Scalability
 Performance
 Relaxed consistency A diverse array of NoSQL projects
 Agility serving a range of use-cases
 Intricacy
 Necessity
Database SPRAIN
8

114
Relational zone
Non-relational
zone
Lotus Notes
Objectivity
MarkLogic
InterSystems
Caché
McObject
Starcounter
ArangoDB
Neo4J
InfiniteGraph
Apache CouchDB
Oracle NoSQL
Redis
Handlersocket
RavenDB
RethinkDB
LevelDB
Apache Accumulo
Apache Cassandra
Apache HBase
Riak
Couchbase
Splice Machine
Actian Ingres
SAP Sybase ASE
EnterpriseDB
SQL
Server
MySQL
InformixMariaDB
SAP
HANA
IBM
DB2
Database.com
ClearDB
Google Cloud SQL
Rackspace
Cloud Databases
AWS RDS
Azure SQL
Database
HP Cloud
Relational Database
StormDB
Teradata
Aster
HPCC
Cloudera
Azure
Data Lake
MapR IBM
BigInsights
Zettaset
NGDATA
Infochimps
Metascale
Rackspace
Qubole
Voldemort
Aerospike
Teradata
IBM PureData
for Analytics/dashDB
Pivotal Greenplum
HP Vertica
SAP Sybase IQ
IBM InfoSphere
Actian Vector
XtremeData
Kx Systems
Exasol
Actian Matrix
ParStream
TokuDB
ScaleDB
ScaleArc
Continuent
TransLattice
NuoDB
Drizzle
JustOneDB
Pivotal GemFire XD
Galera
ScaleBase
Clustrix
Tesora DVE
MemSQL
DatomicUrika-GD
FlockDB
Allegrograph
HypergraphDB
AffinityDB
Trinity
MemCachier
Redis Labs
Memcached Cloud
FairCom
BitYota
IronCache
Grid/cache zone
Memcached
Ehcache
ScaleOut
Software
IBM
eXtreme
Scale
Oracle
Coherence
GigaSpaces XAPApache Ignite
Pivotal
GemFire
CloudTran
InfiniSpan
Hazelcast
Oracle
Exalytics
Oracle
Database
MySQL Cluster
Oracle
Endeca Server Attivio
LucidWorks
Big Data
Lucene/Solr
IBM InfoSphere
Data Explorer
Towards
E-discovery
Towards
enterprise search
Documentum
xDB
Tamino
XML Server
Ipedo XML
Database
ObjectStore
LucidDB
MonetDB
Metamarkets Druid
Apache Spark
AWS
ElastiCache
Firebird
SQLite
Oracle TimesTen
solidDB
Adabas
IBM IMS
UniData
UniVerse
WakandaDB
Altiscale
Oracle Big Data Appliance
OrientDB
Sparksee
Doopex
Treasure
Data
PostgreSQL
Percona Server
vFabric Postgres
© 2015 by 451 Research LLC.
All rights reserved
HyperDex
TIBCO
ActiveSpaces
SAP Sybase SQL Anywhere
JethroData
CitusDB
Pivotal
HD/HAWQ
BigMemory
Actian
Versant
DataStax
Enterprise
Deep
Enigine
Infobright
FatDB
Google Cloud
Datastore
Heroku
Postgres
GrapheneDB
Instacluster
Hypertable
BerkeleyDB
Sqrrl
Enterprise
Azure
HDInsight
HP
Autonomy
Oracle
Exadata
IBM
PureData
IBM
Big SQL
Cloudera
Impala
Apache
Drill
Presto
Microsoft
SQL Server
PDW
Apache
Tajo
Apache
Hive
MammothDB
Altibase HDB
LogicBlox
SRCH2
TIBCO
LogLogic
Splunk
Towards
SIEM
Loggly Sumo
LogicLogentries
InfiniSQL
JumboDB
Actian PSQL
Progress OpenEdge
Kognitio
Altibase XDB
CenturyLink
IBM Softlayer
Joyent
xPlenty
Stardog
MariaDB Enterprise
Apache Storm
Apache S4
IBM
InfoSphere
Streams
TIBCO
StreamBase
DataTorrent
AWS
Kinesis
Feedzai
Guavus
Lokad
SQLStream
Software AG
Key:
General purpose
Specialist analytic
BigTables
Graph
Document
Key value stores
-as-a-Service
Key value direct
access
Hadoop
MySQL ecosystem
Advanced
clustering/sharding
New SQL databases
Data caching
Data grid
Search
Appliances
In-memory
Stream processing
OpenStack Trove
1010dataGoogle
BigQuery
AWS
Redshift
TempoIQInfluxDB
WebScaleSQL
MySQL
FabricSpider
2
E
D
A
B
C
T-Systems
E
D
A
B
C
2 43 5
SQream
SpaceCurve
Postgres-XL
Google Cloud
Dataflow
Trafodion Hadapt
Azure
Search
Red Hat JBoss
Data Grid
654
MongoDB
Cloudant
Iris Couch
MongoLab
Compose
ObjectRocket
CloudBird
Azure DocumentDB
1 3
1 6
Data
Platforms
Map
June 2015
https://451research.
com/dashboard/dp
a
CockroachDB
AWS DynamoDB
AWS SimpleDB
Redis Labs
Redis Cloud
RedisGreen
AWS ElastiCache
with Redis
MagnetoDB
ObjectRocket
with Redis
TokuMX
VoltDB
CortexDB
CodeFutures
Oracle Big Data Cloud
AWS
EMR
Stratio
Teradata Cloud
for Hadoop
MapR-DB
Snowflake
Cloudant Local GridGain In-Memory
Data Fabric
Databricks
Apache Hadoop
MongoDirector
Redis-to-go
GraphHost
Redis Labs
Enterprise Cluster
Azure Redis
Cache
Azure Managed
Cache Service
Azure
In-Role Cache
SciDB AsterixDB
Apache FlinkData Artisans
Brytlyt
MapD
Modulus
Elasticsearch
Elastic
Found
Orchestrate
HP NonStop SQL
Crate
Titan
Tesora
DBaaS
AWS Aurora
MariaDB MaxScale
Azure SQL
Data Warehouse
Hortonworks
Ontotext GraphDB
Google Cloud
BigTable

The NoSQL database landscape
10
MarkLogic ArangoDB
Neo4J
InfiniteGraph
Apache CouchDB
Oracle NoSQL
Redis
Handlersocket
RavenDB
RethinkDB
LevelDB
Apache Accumulo
Apache Cassandra
Apache HBase
Riak
Couchbase
Voldemort
Aerospike
Urika-GD
FlockDB
Allegrograph
HypergraphDB
AffinityDB
OrientDB
Sparksee
HyperDex
DataStax
Enterprise
FatDB
Google Cloud
Datastore
GrapheneDB
Instacluster
Hypertable
BerkeleyDB
Sqrrl
Enterprise
JumboDB
Stardog
MongoDB
Cloudant
Iris Couch
MongoLab
Compose
ObjectRocket
CloudBird
Azure DocumentDB
AWS DynamoDB
AWS SimpleDB
Redis Labs
Redis Cloud
RedisGreen
AWS ElastiCache
with Redis
MagnetoDB
ObjectRocket
with Redis
TokuMX
CortexDB
MapR-DB
Cloudant Local
MongoDirector
Redis-to-go
GraphHost
Redis Labs
Enterprise Cluster
Azure Redis
Cache
Modulus
Orchestrate
Google Cloud
BigTable
Titan
Trinity
Ontotext GraphDB

The idea that different data storage models have their own strengths and
should be used in combination to solve the various data processing
needs of a complex application.
Polyglot persistence
Wide-column
Data is mapped by
a row key, column
key and time
stamp.
Key Value
Store keys and
associated values.
Graph
Store data and the
relationships
between data.
Document
Store all data
related to a
specific key as a
single document.
DATA MODEL COMPLEXITY
11

Wide-columnKey Value GraphDocument
12

13
Search Analytics Cache

Multi-model
Wide-column
stores
Key Value GraphDocument
stores
14
Multi-model databases
Support a combination of the various individual NoSQL data
models
- avoid operational complexity
- maintain developer agility

Multi-model
Wide-column
stores
stores
15

Multi-model
Wide-column
stores
stores
16

Multi-model data platform
17

Thank You!
matthew.aslett@451research.com
@maslett
www.451research.com

Delivering on a Data Platform
Peter Coppola
VP, Product & Marketing

THE EVOLUTION OF NOSQL
Unstructured
Data Platforms
Multi-Model
Solutions
Point
Solutions
Basho Technologies | 20
CONFIDENTIAL

42% of database decision makers admit they
struggle to manage the NoSQL solutions
deployed in their environments”
COMPLEX TECHNOLOGY STACK
Riak
Spark

OUR CUSTOMERS ARE INTEGRATING
NoSQL, Caching, Real-time Analytics and Search

Big data, hybrid cloud
architectures and IoT
require developers to
integrate, replicate and
synchronize information
across functions
Mac Devine
Vice President and CTO
IBM Cloud Services

Enterprises building Big Data, IoT and Hybrid Cloud
applications are struggling with complexity
Distributed workload challenges:
availability, scale and geo-location
Proliferation of data models: Key-Value,
In-Memory, Document, etc.
High costs to ensure data accuracy:
replication, synchronization and
integration
High operational costs: architectural
and management simplicity &
efficiency
Lack of available developer expertise
Big Data
Hybrid
Cloud
IoT
Database(s)
Storage
Caches Analytics Queues Search
Log
Mgmt.

Current Operational Challenges
• Managing separate clusters for
Riak KV, Redis and Spark
• Manually synchronizing data
across the applications
• Using Zookeeper for Spark
cluster management
• Manually sharding data in
Redis
• Manually managing failures of
Redis instances
Customers manually integrating
Big data applications like
ours need to integrate
and then deploy many
different technology
components
Martin Davies
CEO of Technology

BASHO DATA PLATFORM
SERVICE
INSTANCES
STORAGE
INSTANCES
Solr
Spark
Redis
(Caching)
Solr
Elastic
Search
Web Services
3rd Party Web
Services &
Integrations
Riak
Key/Value
Riak Object
Storage
Riak
Coming Soon
Document
Store
Columnar Graph
Replication &
Synchronization
Message
Routing
Cluster
Management &
Monitoring
Logging &
Analytics
Internal Data
Store
CORE SERVICES

CONFIDENTIAL
BASHO DATA PLATFORM
Data Replication and Synchronization
Replicate and synchronize data across and between storage instances and service instances to ensure data accuracy with
no data loss and high availability.
Cluster Management
Integrated cluster management automates deployment and configuration of Riak KV, Riak S2, Spark and Redis. Once
deployed in production, auto-detect issues and restart Redis instances or Spark clusters. Cluster management eliminates
the need for Zookeeper.
Internal Data Store
A built-in, distributed data store for ensuring speed, fault-tolerance and ease-of-operations is used to persist static and
dynamic configuration data (port number and IP address) across the Basho Data Platform.
Message Routing
A high-throughput, distributed message system for speed, scalability and high availability. This message system will have
the ability to persist and route messages across platform clusters.
Logging and Analytics
Event logs provide valuable information that can facilitate the enhanced tuning of clusters and accurately analyze dataflow
across the cluster
Core Services

BASHO DATA PLATFORM: SERVICE INSTANCES
Apache Spark Add-On
Zookeeper not required
Real-Time Analytics
• Move data from Riak KV
to Spark for batch and
real-time analytics and
store results back in Riak
KV for future processing
• Cluster management
eliminates the need for
Zookeeper
Redis Add-On
Availability w/ auto-sharding
Integrated Caching
• Redis is now Enterprise
grade with high
availability, data
synchronization with
Riak KV and cluster
management
• Automatic data
sharding across
multiple cache servers
simplifies operations
Apache Solr Add-On
Query like Solr
Enriched Search
• Powerful full-text search
of Solr with the
availability and
scalability of Riak KV
• As data changes,
search indexes are
automatically
synchronized

BASHO DIFFERENCE
• Ease of Scale
• Optimized for High Availability
• Data Correctness
• Solving data distribution
challenge
• Operational Simplicity
CONFIDENTIAL
We are excited that
Basho is stepping
forward and simplifying
our daunting technology
stack
Jason Ordway
CTO

RIAK DEPLOYED
WORLDWIDE

tecFinal 451 webinar deck

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a tecFinal 451 webinar deck

Similar a tecFinal 451 webinar deck (20)

Más de Basho Technologies

Más de Basho Technologies (11)

Último

Último (20)

tecFinal 451 webinar deck

Notas del editor