SlideShare una empresa de Scribd logo
1 de 31
NoSQL: The Challenges
Beyond Multi-Model and
Integrating into Big Data
Applications
Presenters:
Matthew Aslett,
Research Director, 451 Research
• NoSQL Beyond Polyglot
Persistence
Peter Coppola, VP Product &
Marketing, Basho Technologies
• How a Data Platform solves the
challenges of integrating NoSQL
into Big Data applications
NoSQL: Beyond polyglot persistence
Matthew Aslett, research director
451 Research is an information
technology research & advisory company
Founded in 2000
210+ employees, including over 100 analysts
1,000+ clients: Technology & Service providers, corporate
advisory, finance, professional services, and IT decision makers
12,500+ senior IT professionals in our research community
Over 52 million data points each quarter
4,500+ reports published each year covering 2,000+
innovative technology & service providers
Headquartered in New York City with offices in London,
Boston, San Francisco, and Washington D.C.
451 Research and its sister company Uptime Institute
comprise the two divisions of The 451 Group
Research & Data
Advisory Services
Events
4
Copyright (C) 2015 451 Research LLC
The birth of NoSQL
• The genesis of much – although by no means all – of the momentum behind
the NoSQL database movement can be attributed to two research papers:
• Google’s BigTable: A Distributed Storage System for Structured Data, presented at the
Seventh Symposium on Operating System Design and Implementation, in November
2006
• Amazon’s Dynamo: Amazon’s Highly Available Key-Value Store, presented at the 21st
ACM Symposium on Operating Systems Principles, in October 2007
• The term itself was coined by Johan Oskarsson as the name for a June 2009
meeting of developers, users and others interested in a group of loosely
related data technologies
5
SPRAINED RELATIONAL DATABASES
Photo credit: Foxtongue on Flickr http://www.flickr.com/photos/foxtongue/4844016087/
 The traditional relational database has been stretched beyond its normal
capacity by the needs of high-volume, highly distributed or highly complex
applications.
 Scalability
 Performance
 Relaxed consistency Increased willingness to look towards
 Agility emerging alternatives
 Intricacy
 Necessity
Database SPRAIN
7
 The traditional relational database has been stretched beyond its normal
capacity by the needs of high-volume, highly distributed or highly complex
applications.
 Scalability
 Performance
 Relaxed consistency A diverse array of NoSQL projects
 Agility serving a range of use-cases
 Intricacy
 Necessity
Database SPRAIN
8
114
Relational zone
Non-relational
zone
Lotus Notes
Objectivity
MarkLogic
InterSystems
Caché
McObject
Starcounter
ArangoDB
Neo4J
InfiniteGraph
Apache CouchDB
Oracle NoSQL
Redis
Handlersocket
RavenDB
RethinkDB
LevelDB
Apache Accumulo
Apache Cassandra
Apache HBase
Riak
Couchbase
Splice Machine
Actian Ingres
SAP Sybase ASE
EnterpriseDB
SQL
Server
MySQL
InformixMariaDB
SAP
HANA
IBM
DB2
Database.com
ClearDB
Google Cloud SQL
Rackspace
Cloud Databases
AWS RDS
Azure SQL
Database
HP Cloud
Relational Database
StormDB
Teradata
Aster
HPCC
Cloudera
Azure
Data Lake
MapR IBM
BigInsights
Zettaset
NGDATA
Infochimps
Metascale
Rackspace
Qubole
Voldemort
Aerospike
Teradata
IBM PureData
for Analytics/dashDB
Pivotal Greenplum
HP Vertica
SAP Sybase IQ
IBM InfoSphere
Actian Vector
XtremeData
Kx Systems
Exasol
Actian Matrix
ParStream
TokuDB
ScaleDB
ScaleArc
Continuent
TransLattice
NuoDB
Drizzle
JustOneDB
Pivotal GemFire XD
Galera
ScaleBase
Clustrix
Tesora DVE
MemSQL
DatomicUrika-GD
FlockDB
Allegrograph
HypergraphDB
AffinityDB
Trinity
MemCachier
Redis Labs
Memcached Cloud
FairCom
BitYota
IronCache
Grid/cache zone
Memcached
Ehcache
ScaleOut
Software
IBM
eXtreme
Scale
Oracle
Coherence
GigaSpaces XAPApache Ignite
Pivotal
GemFire
CloudTran
InfiniSpan
Hazelcast
Oracle
Exalytics
Oracle
Database
MySQL Cluster
Oracle
Endeca Server Attivio
LucidWorks
Big Data
Lucene/Solr
IBM InfoSphere
Data Explorer
Towards
E-discovery
Towards
enterprise search
Documentum
xDB
Tamino
XML Server
Ipedo XML
Database
ObjectStore
LucidDB
MonetDB
Metamarkets Druid
Apache Spark
AWS
ElastiCache
Firebird
SQLite
Oracle TimesTen
solidDB
Adabas
IBM IMS
UniData
UniVerse
WakandaDB
Altiscale
Oracle Big Data Appliance
OrientDB
Sparksee
Doopex
Treasure
Data
PostgreSQL
Percona Server
vFabric Postgres
© 2015 by 451 Research LLC.
All rights reserved
HyperDex
TIBCO
ActiveSpaces
SAP Sybase SQL Anywhere
JethroData
CitusDB
Pivotal
HD/HAWQ
BigMemory
Actian
Versant
DataStax
Enterprise
Deep
Enigine
Infobright
FatDB
Google Cloud
Datastore
Heroku
Postgres
GrapheneDB
Instacluster
Hypertable
BerkeleyDB
Sqrrl
Enterprise
Azure
HDInsight
HP
Autonomy
Oracle
Exadata
IBM
PureData
IBM
Big SQL
Cloudera
Impala
Apache
Drill
Presto
Microsoft
SQL Server
PDW
Apache
Tajo
Apache
Hive
MammothDB
Altibase HDB
LogicBlox
SRCH2
TIBCO
LogLogic
Splunk
Towards
SIEM
Loggly Sumo
LogicLogentries
InfiniSQL
JumboDB
Actian PSQL
Progress OpenEdge
Kognitio
Altibase XDB
CenturyLink
IBM Softlayer
Joyent
xPlenty
Stardog
MariaDB Enterprise
Apache Storm
Apache S4
IBM
InfoSphere
Streams
TIBCO
StreamBase
DataTorrent
AWS
Kinesis
Feedzai
Guavus
Lokad
SQLStream
Software AG
Key:
General purpose
Specialist analytic
BigTables
Graph
Document
Key value stores
-as-a-Service
Key value direct
access
Hadoop
MySQL ecosystem
Advanced
clustering/sharding
New SQL databases
Data caching
Data grid
Search
Appliances
In-memory
Stream processing
OpenStack Trove
1010dataGoogle
BigQuery
AWS
Redshift
TempoIQInfluxDB
WebScaleSQL
MySQL
FabricSpider
2
E
D
A
B
C
T-Systems
E
D
A
B
C
2 43 5
SQream
SpaceCurve
Postgres-XL
Google Cloud
Dataflow
Trafodion Hadapt
Azure
Search
Red Hat JBoss
Data Grid
654
MongoDB
Cloudant
Iris Couch
MongoLab
Compose
ObjectRocket
CloudBird
Azure DocumentDB
1 3
1 6
Data
Platforms
Map
June 2015
https://451research.
com/dashboard/dp
a
CockroachDB
AWS DynamoDB
AWS SimpleDB
Redis Labs
Redis Cloud
RedisGreen
AWS ElastiCache
with Redis
MagnetoDB
ObjectRocket
with Redis
TokuMX
VoltDB
CortexDB
CodeFutures
Oracle Big Data Cloud
AWS
EMR
Stratio
Teradata Cloud
for Hadoop
MapR-DB
Snowflake
Cloudant Local GridGain In-Memory
Data Fabric
Databricks
Apache Hadoop
MongoDirector
Redis-to-go
GraphHost
Redis Labs
Enterprise Cluster
Azure Redis
Cache
Azure Managed
Cache Service
Azure
In-Role Cache
SciDB AsterixDB
Apache FlinkData Artisans
Brytlyt
MapD
Modulus
Elasticsearch
Elastic
Found
Orchestrate
HP NonStop SQL
Crate
Titan
Tesora
DBaaS
AWS Aurora
MariaDB MaxScale
Azure SQL
Data Warehouse
Hortonworks
Ontotext GraphDB
Google Cloud
BigTable
The NoSQL database landscape
10
MarkLogic ArangoDB
Neo4J
InfiniteGraph
Apache CouchDB
Oracle NoSQL
Redis
Handlersocket
RavenDB
RethinkDB
LevelDB
Apache Accumulo
Apache Cassandra
Apache HBase
Riak
Couchbase
Voldemort
Aerospike
Urika-GD
FlockDB
Allegrograph
HypergraphDB
AffinityDB
OrientDB
Sparksee
HyperDex
DataStax
Enterprise
FatDB
Google Cloud
Datastore
GrapheneDB
Instacluster
Hypertable
BerkeleyDB
Sqrrl
Enterprise
JumboDB
Stardog
MongoDB
Cloudant
Iris Couch
MongoLab
Compose
ObjectRocket
CloudBird
Azure DocumentDB
AWS DynamoDB
AWS SimpleDB
Redis Labs
Redis Cloud
RedisGreen
AWS ElastiCache
with Redis
MagnetoDB
ObjectRocket
with Redis
TokuMX
CortexDB
MapR-DB
Cloudant Local
MongoDirector
Redis-to-go
GraphHost
Redis Labs
Enterprise Cluster
Azure Redis
Cache
Modulus
Orchestrate
Google Cloud
BigTable
Titan
Trinity
Ontotext GraphDB
The idea that different data storage models have their own strengths and
should be used in combination to solve the various data processing
needs of a complex application.
Polyglot persistence
Wide-column
Data is mapped by
a row key, column
key and time
stamp.
Key Value
Store keys and
associated values.
Graph
Store data and the
relationships
between data.
Document
Store all data
related to a
specific key as a
single document.
DATA MODEL COMPLEXITY
11
Polyglot persistence
Wide-columnKey Value GraphDocument
12
Polyglot persistence
Wide-columnKey Value GraphDocument
13
Search Analytics Cache
Multi-model
Wide-column
stores
Key Value GraphDocument
stores
14
Search Analytics Cache
Multi-model databases
Support a combination of the various individual NoSQL data
models
- avoid operational complexity
- maintain developer agility
Multi-model
Wide-column
stores
Key Value GraphDocument
stores
15
Search Analytics Cache
Multi-model databases
Multi-model
Wide-column
stores
Key Value GraphDocument
stores
16
Search Analytics Cache
Multi-model databases
Multi-model data platform
Wide-columnKey Value GraphDocument
17
Search Analytics Cache
Thank You!
matthew.aslett@451research.com
@maslett
www.451research.com
Delivering on a Data Platform
Peter Coppola
VP, Product & Marketing
THE EVOLUTION OF NOSQL
Unstructured
Data Platforms
Multi-Model
Solutions
Point
Solutions
Basho Technologies | 20
CONFIDENTIAL
42% of database decision makers admit they
struggle to manage the NoSQL solutions
deployed in their environments”
COMPLEX TECHNOLOGY STACK
Riak
Spark
Basho Technologies | 21
OUR CUSTOMERS ARE INTEGRATING
NoSQL, Caching, Real-time Analytics and Search
Basho Technologies | 22
Big data, hybrid cloud
architectures and IoT
require developers to
integrate, replicate and
synchronize information
across functions
Mac Devine
Vice President and CTO
IBM Cloud Services
Enterprises building Big Data, IoT and Hybrid Cloud
applications are struggling with complexity
Distributed workload challenges:
availability, scale and geo-location
Proliferation of data models: Key-Value,
In-Memory, Document, etc.
High costs to ensure data accuracy:
replication, synchronization and
integration
High operational costs: architectural
and management simplicity &
efficiency
Lack of available developer expertise
Big Data
Hybrid
Cloud
IoT
Database(s)
Storage
Caches Analytics Queues Search
Log
Mgmt.
Current Operational Challenges
• Managing separate clusters for
Riak KV, Redis and Spark
• Manually synchronizing data
across the applications
• Using Zookeeper for Spark
cluster management
• Manually sharding data in
Redis
• Manually managing failures of
Redis instances
Customers manually integrating
Big data applications like
ours need to integrate
and then deploy many
different technology
components
Martin Davies
CEO of Technology
BASHO DATA PLATFORM
Basho Technologies | 26
SERVICE
INSTANCES
STORAGE
INSTANCES
Solr
Spark
Redis
(Caching)
Solr
Elastic
Search
Web Services
3rd Party Web
Services &
Integrations
Riak
Key/Value
Riak Object
Storage
Riak
Coming Soon
Document
Store
Columnar Graph
Replication &
Synchronization
Message
Routing
Cluster
Management &
Monitoring
Logging &
Analytics
Internal Data
Store
CORE SERVICES
CONFIDENTIAL
BASHO DATA PLATFORM
Data Replication and Synchronization
Replicate and synchronize data across and between storage instances and service instances to ensure data accuracy with
no data loss and high availability.
Cluster Management
Integrated cluster management automates deployment and configuration of Riak KV, Riak S2, Spark and Redis. Once
deployed in production, auto-detect issues and restart Redis instances or Spark clusters. Cluster management eliminates
the need for Zookeeper.
Internal Data Store
A built-in, distributed data store for ensuring speed, fault-tolerance and ease-of-operations is used to persist static and
dynamic configuration data (port number and IP address) across the Basho Data Platform.
Message Routing
A high-throughput, distributed message system for speed, scalability and high availability. This message system will have
the ability to persist and route messages across platform clusters.
Logging and Analytics
Event logs provide valuable information that can facilitate the enhanced tuning of clusters and accurately analyze dataflow
across the cluster
Core Services
BASHO DATA PLATFORM: SERVICE INSTANCES
Apache Spark Add-On
Zookeeper not required
Real-Time Analytics
• Move data from Riak KV
to Spark for batch and
real-time analytics and
store results back in Riak
KV for future processing
• Cluster management
eliminates the need for
Zookeeper
Redis Add-On
Availability w/ auto-sharding
Integrated Caching
• Redis is now Enterprise
grade with high
availability, data
synchronization with
Riak KV and cluster
management
• Automatic data
sharding across
multiple cache servers
simplifies operations
Apache Solr Add-On
Query like Solr
Enriched Search
• Powerful full-text search
of Solr with the
availability and
scalability of Riak KV
• As data changes,
search indexes are
automatically
synchronized
BASHO DIFFERENCE
• Ease of Scale
• Optimized for High Availability
• Data Correctness
• Solving data distribution
challenge
• Operational Simplicity
Basho Technologies | 29
CONFIDENTIAL
We are excited that
Basho is stepping
forward and simplifying
our daunting technology
stack
Jason Ordway
CTO
Basho Technologies | 30
RIAK DEPLOYED
WORLDWIDE
QUESTIONS?

Más contenido relacionado

La actualidad más candente

Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
MongoDB
 
Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez
Willy Lulciuc
 

La actualidad más candente (20)

Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 edition
 
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...
Webinar: Proofpoint, a pioneer in security-as-a-service protects people, info...
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster Services
 
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Azure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsAzure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analytics
 
Big Data Landscape 2016
Big Data Landscape 2016Big Data Landscape 2016
Big Data Landscape 2016
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
 
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
 
Introducción a la arquitectura Data Lake con Azure
Introducción a la arquitectura Data Lake con AzureIntroducción a la arquitectura Data Lake con Azure
Introducción a la arquitectura Data Lake con Azure
 
How Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackHow Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science Stack
 
Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)
 
Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez
 
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
 
How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 

Similar a tecFinal 451 webinar deck

Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 

Similar a tecFinal 451 webinar deck (20)

QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the EnterpriseSQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
 
Transforming Business in a Digital Era with Big Data and Microsoft
Transforming Business in a Digital Era with Big Data and MicrosoftTransforming Business in a Digital Era with Big Data and Microsoft
Transforming Business in a Digital Era with Big Data and Microsoft
 
Graph databases and OrientDB
Graph databases and OrientDBGraph databases and OrientDB
Graph databases and OrientDB
 
Big Data and Oracle - 2013
Big Data and Oracle - 2013Big Data and Oracle - 2013
Big Data and Oracle - 2013
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
4AA6-4492ENW
4AA6-4492ENW4AA6-4492ENW
4AA6-4492ENW
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes
 

Más de Basho Technologies

A Zen Journey to Database Management
A Zen Journey to Database ManagementA Zen Journey to Database Management
A Zen Journey to Database Management
Basho Technologies
 
O'Reilly Webinar: Simplicity Scales - Big Data
O'Reilly Webinar: Simplicity Scales - Big Data O'Reilly Webinar: Simplicity Scales - Big Data
O'Reilly Webinar: Simplicity Scales - Big Data
Basho Technologies
 

Más de Basho Technologies (11)

Data Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLData Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQL
 
A Zen Journey to Database Management
A Zen Journey to Database ManagementA Zen Journey to Database Management
A Zen Journey to Database Management
 
Vagrant up a Distributed Test Environment - Nginx Summit 2015
Vagrant up a Distributed Test Environment - Nginx Summit 2015Vagrant up a Distributed Test Environment - Nginx Summit 2015
Vagrant up a Distributed Test Environment - Nginx Summit 2015
 
O'Reilly Webinar: Simplicity Scales - Big Data
O'Reilly Webinar: Simplicity Scales - Big Data O'Reilly Webinar: Simplicity Scales - Big Data
O'Reilly Webinar: Simplicity Scales - Big Data
 
A little about Message Queues - Boston Riak Meetup
A little about Message Queues - Boston Riak MeetupA little about Message Queues - Boston Riak Meetup
A little about Message Queues - Boston Riak Meetup
 
NoSQL Implementation - Part 1 (Velocity 2015)
NoSQL Implementation - Part 1 (Velocity 2015)NoSQL Implementation - Part 1 (Velocity 2015)
NoSQL Implementation - Part 1 (Velocity 2015)
 
Coding with Riak (from Velocity 2015)
Coding with Riak (from Velocity 2015)Coding with Riak (from Velocity 2015)
Coding with Riak (from Velocity 2015)
 
Relational Databases to Riak
Relational Databases to RiakRelational Databases to Riak
Relational Databases to Riak
 
Taming Big Data with NoSQL
Taming Big Data with NoSQLTaming Big Data with NoSQL
Taming Big Data with NoSQL
 
Basho and Riak at GOTO Stockholm: "Don't Use My Database."
Basho and Riak at GOTO Stockholm:  "Don't Use My Database."Basho and Riak at GOTO Stockholm:  "Don't Use My Database."
Basho and Riak at GOTO Stockholm: "Don't Use My Database."
 
Using Basho Bench to Load Test Distributed Applications
Using Basho Bench to Load Test Distributed ApplicationsUsing Basho Bench to Load Test Distributed Applications
Using Basho Bench to Load Test Distributed Applications
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

tecFinal 451 webinar deck

  • 1. NoSQL: The Challenges Beyond Multi-Model and Integrating into Big Data Applications
  • 2. Presenters: Matthew Aslett, Research Director, 451 Research • NoSQL Beyond Polyglot Persistence Peter Coppola, VP Product & Marketing, Basho Technologies • How a Data Platform solves the challenges of integrating NoSQL into Big Data applications
  • 3. NoSQL: Beyond polyglot persistence Matthew Aslett, research director
  • 4. 451 Research is an information technology research & advisory company Founded in 2000 210+ employees, including over 100 analysts 1,000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers 12,500+ senior IT professionals in our research community Over 52 million data points each quarter 4,500+ reports published each year covering 2,000+ innovative technology & service providers Headquartered in New York City with offices in London, Boston, San Francisco, and Washington D.C. 451 Research and its sister company Uptime Institute comprise the two divisions of The 451 Group Research & Data Advisory Services Events 4 Copyright (C) 2015 451 Research LLC
  • 5. The birth of NoSQL • The genesis of much – although by no means all – of the momentum behind the NoSQL database movement can be attributed to two research papers: • Google’s BigTable: A Distributed Storage System for Structured Data, presented at the Seventh Symposium on Operating System Design and Implementation, in November 2006 • Amazon’s Dynamo: Amazon’s Highly Available Key-Value Store, presented at the 21st ACM Symposium on Operating Systems Principles, in October 2007 • The term itself was coined by Johan Oskarsson as the name for a June 2009 meeting of developers, users and others interested in a group of loosely related data technologies 5
  • 6. SPRAINED RELATIONAL DATABASES Photo credit: Foxtongue on Flickr http://www.flickr.com/photos/foxtongue/4844016087/
  • 7.  The traditional relational database has been stretched beyond its normal capacity by the needs of high-volume, highly distributed or highly complex applications.  Scalability  Performance  Relaxed consistency Increased willingness to look towards  Agility emerging alternatives  Intricacy  Necessity Database SPRAIN 7
  • 8.  The traditional relational database has been stretched beyond its normal capacity by the needs of high-volume, highly distributed or highly complex applications.  Scalability  Performance  Relaxed consistency A diverse array of NoSQL projects  Agility serving a range of use-cases  Intricacy  Necessity Database SPRAIN 8
  • 9. 114 Relational zone Non-relational zone Lotus Notes Objectivity MarkLogic InterSystems Caché McObject Starcounter ArangoDB Neo4J InfiniteGraph Apache CouchDB Oracle NoSQL Redis Handlersocket RavenDB RethinkDB LevelDB Apache Accumulo Apache Cassandra Apache HBase Riak Couchbase Splice Machine Actian Ingres SAP Sybase ASE EnterpriseDB SQL Server MySQL InformixMariaDB SAP HANA IBM DB2 Database.com ClearDB Google Cloud SQL Rackspace Cloud Databases AWS RDS Azure SQL Database HP Cloud Relational Database StormDB Teradata Aster HPCC Cloudera Azure Data Lake MapR IBM BigInsights Zettaset NGDATA Infochimps Metascale Rackspace Qubole Voldemort Aerospike Teradata IBM PureData for Analytics/dashDB Pivotal Greenplum HP Vertica SAP Sybase IQ IBM InfoSphere Actian Vector XtremeData Kx Systems Exasol Actian Matrix ParStream TokuDB ScaleDB ScaleArc Continuent TransLattice NuoDB Drizzle JustOneDB Pivotal GemFire XD Galera ScaleBase Clustrix Tesora DVE MemSQL DatomicUrika-GD FlockDB Allegrograph HypergraphDB AffinityDB Trinity MemCachier Redis Labs Memcached Cloud FairCom BitYota IronCache Grid/cache zone Memcached Ehcache ScaleOut Software IBM eXtreme Scale Oracle Coherence GigaSpaces XAPApache Ignite Pivotal GemFire CloudTran InfiniSpan Hazelcast Oracle Exalytics Oracle Database MySQL Cluster Oracle Endeca Server Attivio LucidWorks Big Data Lucene/Solr IBM InfoSphere Data Explorer Towards E-discovery Towards enterprise search Documentum xDB Tamino XML Server Ipedo XML Database ObjectStore LucidDB MonetDB Metamarkets Druid Apache Spark AWS ElastiCache Firebird SQLite Oracle TimesTen solidDB Adabas IBM IMS UniData UniVerse WakandaDB Altiscale Oracle Big Data Appliance OrientDB Sparksee Doopex Treasure Data PostgreSQL Percona Server vFabric Postgres © 2015 by 451 Research LLC. All rights reserved HyperDex TIBCO ActiveSpaces SAP Sybase SQL Anywhere JethroData CitusDB Pivotal HD/HAWQ BigMemory Actian Versant DataStax Enterprise Deep Enigine Infobright FatDB Google Cloud Datastore Heroku Postgres GrapheneDB Instacluster Hypertable BerkeleyDB Sqrrl Enterprise Azure HDInsight HP Autonomy Oracle Exadata IBM PureData IBM Big SQL Cloudera Impala Apache Drill Presto Microsoft SQL Server PDW Apache Tajo Apache Hive MammothDB Altibase HDB LogicBlox SRCH2 TIBCO LogLogic Splunk Towards SIEM Loggly Sumo LogicLogentries InfiniSQL JumboDB Actian PSQL Progress OpenEdge Kognitio Altibase XDB CenturyLink IBM Softlayer Joyent xPlenty Stardog MariaDB Enterprise Apache Storm Apache S4 IBM InfoSphere Streams TIBCO StreamBase DataTorrent AWS Kinesis Feedzai Guavus Lokad SQLStream Software AG Key: General purpose Specialist analytic BigTables Graph Document Key value stores -as-a-Service Key value direct access Hadoop MySQL ecosystem Advanced clustering/sharding New SQL databases Data caching Data grid Search Appliances In-memory Stream processing OpenStack Trove 1010dataGoogle BigQuery AWS Redshift TempoIQInfluxDB WebScaleSQL MySQL FabricSpider 2 E D A B C T-Systems E D A B C 2 43 5 SQream SpaceCurve Postgres-XL Google Cloud Dataflow Trafodion Hadapt Azure Search Red Hat JBoss Data Grid 654 MongoDB Cloudant Iris Couch MongoLab Compose ObjectRocket CloudBird Azure DocumentDB 1 3 1 6 Data Platforms Map June 2015 https://451research. com/dashboard/dp a CockroachDB AWS DynamoDB AWS SimpleDB Redis Labs Redis Cloud RedisGreen AWS ElastiCache with Redis MagnetoDB ObjectRocket with Redis TokuMX VoltDB CortexDB CodeFutures Oracle Big Data Cloud AWS EMR Stratio Teradata Cloud for Hadoop MapR-DB Snowflake Cloudant Local GridGain In-Memory Data Fabric Databricks Apache Hadoop MongoDirector Redis-to-go GraphHost Redis Labs Enterprise Cluster Azure Redis Cache Azure Managed Cache Service Azure In-Role Cache SciDB AsterixDB Apache FlinkData Artisans Brytlyt MapD Modulus Elasticsearch Elastic Found Orchestrate HP NonStop SQL Crate Titan Tesora DBaaS AWS Aurora MariaDB MaxScale Azure SQL Data Warehouse Hortonworks Ontotext GraphDB Google Cloud BigTable
  • 10. The NoSQL database landscape 10 MarkLogic ArangoDB Neo4J InfiniteGraph Apache CouchDB Oracle NoSQL Redis Handlersocket RavenDB RethinkDB LevelDB Apache Accumulo Apache Cassandra Apache HBase Riak Couchbase Voldemort Aerospike Urika-GD FlockDB Allegrograph HypergraphDB AffinityDB OrientDB Sparksee HyperDex DataStax Enterprise FatDB Google Cloud Datastore GrapheneDB Instacluster Hypertable BerkeleyDB Sqrrl Enterprise JumboDB Stardog MongoDB Cloudant Iris Couch MongoLab Compose ObjectRocket CloudBird Azure DocumentDB AWS DynamoDB AWS SimpleDB Redis Labs Redis Cloud RedisGreen AWS ElastiCache with Redis MagnetoDB ObjectRocket with Redis TokuMX CortexDB MapR-DB Cloudant Local MongoDirector Redis-to-go GraphHost Redis Labs Enterprise Cluster Azure Redis Cache Modulus Orchestrate Google Cloud BigTable Titan Trinity Ontotext GraphDB
  • 11. The idea that different data storage models have their own strengths and should be used in combination to solve the various data processing needs of a complex application. Polyglot persistence Wide-column Data is mapped by a row key, column key and time stamp. Key Value Store keys and associated values. Graph Store data and the relationships between data. Document Store all data related to a specific key as a single document. DATA MODEL COMPLEXITY 11
  • 13. Polyglot persistence Wide-columnKey Value GraphDocument 13 Search Analytics Cache
  • 14. Multi-model Wide-column stores Key Value GraphDocument stores 14 Search Analytics Cache Multi-model databases Support a combination of the various individual NoSQL data models - avoid operational complexity - maintain developer agility
  • 17. Multi-model data platform Wide-columnKey Value GraphDocument 17 Search Analytics Cache
  • 19. Delivering on a Data Platform Peter Coppola VP, Product & Marketing
  • 20. THE EVOLUTION OF NOSQL Unstructured Data Platforms Multi-Model Solutions Point Solutions Basho Technologies | 20 CONFIDENTIAL
  • 21. 42% of database decision makers admit they struggle to manage the NoSQL solutions deployed in their environments” COMPLEX TECHNOLOGY STACK Riak Spark Basho Technologies | 21
  • 22. OUR CUSTOMERS ARE INTEGRATING NoSQL, Caching, Real-time Analytics and Search Basho Technologies | 22
  • 23. Big data, hybrid cloud architectures and IoT require developers to integrate, replicate and synchronize information across functions Mac Devine Vice President and CTO IBM Cloud Services
  • 24. Enterprises building Big Data, IoT and Hybrid Cloud applications are struggling with complexity Distributed workload challenges: availability, scale and geo-location Proliferation of data models: Key-Value, In-Memory, Document, etc. High costs to ensure data accuracy: replication, synchronization and integration High operational costs: architectural and management simplicity & efficiency Lack of available developer expertise Big Data Hybrid Cloud IoT Database(s) Storage Caches Analytics Queues Search Log Mgmt.
  • 25. Current Operational Challenges • Managing separate clusters for Riak KV, Redis and Spark • Manually synchronizing data across the applications • Using Zookeeper for Spark cluster management • Manually sharding data in Redis • Manually managing failures of Redis instances Customers manually integrating Big data applications like ours need to integrate and then deploy many different technology components Martin Davies CEO of Technology
  • 26. BASHO DATA PLATFORM Basho Technologies | 26 SERVICE INSTANCES STORAGE INSTANCES Solr Spark Redis (Caching) Solr Elastic Search Web Services 3rd Party Web Services & Integrations Riak Key/Value Riak Object Storage Riak Coming Soon Document Store Columnar Graph Replication & Synchronization Message Routing Cluster Management & Monitoring Logging & Analytics Internal Data Store CORE SERVICES
  • 27. CONFIDENTIAL BASHO DATA PLATFORM Data Replication and Synchronization Replicate and synchronize data across and between storage instances and service instances to ensure data accuracy with no data loss and high availability. Cluster Management Integrated cluster management automates deployment and configuration of Riak KV, Riak S2, Spark and Redis. Once deployed in production, auto-detect issues and restart Redis instances or Spark clusters. Cluster management eliminates the need for Zookeeper. Internal Data Store A built-in, distributed data store for ensuring speed, fault-tolerance and ease-of-operations is used to persist static and dynamic configuration data (port number and IP address) across the Basho Data Platform. Message Routing A high-throughput, distributed message system for speed, scalability and high availability. This message system will have the ability to persist and route messages across platform clusters. Logging and Analytics Event logs provide valuable information that can facilitate the enhanced tuning of clusters and accurately analyze dataflow across the cluster Core Services
  • 28. BASHO DATA PLATFORM: SERVICE INSTANCES Apache Spark Add-On Zookeeper not required Real-Time Analytics • Move data from Riak KV to Spark for batch and real-time analytics and store results back in Riak KV for future processing • Cluster management eliminates the need for Zookeeper Redis Add-On Availability w/ auto-sharding Integrated Caching • Redis is now Enterprise grade with high availability, data synchronization with Riak KV and cluster management • Automatic data sharding across multiple cache servers simplifies operations Apache Solr Add-On Query like Solr Enriched Search • Powerful full-text search of Solr with the availability and scalability of Riak KV • As data changes, search indexes are automatically synchronized
  • 29. BASHO DIFFERENCE • Ease of Scale • Optimized for High Availability • Data Correctness • Solving data distribution challenge • Operational Simplicity Basho Technologies | 29 CONFIDENTIAL We are excited that Basho is stepping forward and simplifying our daunting technology stack Jason Ordway CTO
  • 30. Basho Technologies | 30 RIAK DEPLOYED WORLDWIDE

Notas del editor

  1. specialized databases enabled users to store and process data using nonrelational models, including the key value, wide column, document and graph data models
  2. An argument against using specific databases for specific workload requirements is that it encourages the use of multiple databases to support an individual application, leading to operational complexity and inflexibility driven by interdependence. This complexity is not only driven by the interdependence of multiple databases but also by the components such as caching, search and analytics that are truly needed to power these applications. Multi-model: enable the flexibility of polyglot persistence without the operational complexity by supporting multiple data models without multiple databases
  3. Thank Matt – for the next 15 mins or so focus on what customers are telling us about the challenges they face in integrating NoSQL into their big data applications and how a data platform can help address the issues they face
  4. In talking with customers we see 3 levels of NoSQL adoption
  5. Here is an example of the technology stack for one of our customers. You see databases, in memory analytics, caching. message queues, web proxies, distributed configuration service and more
  6. TechValidate survey of our install base. You see a high percentage of our customers with enterprise search (both Solr and ES). Lots of customers using Redis (usually as a cache) and message queues or pub/sub msg brokers like RabbitMQ and Kafka. While there is low deployment of Spark – today – we know there is very high activity around trials, testing and planned projects for Spark as an analytics platform with our key/value store and other data sources
  7. One of the big challenges with integrating databases with other components of the data tier like caching, etc is the data distribution challenge – or as Mac Devine puts it having to integrate, replicate and synchronize data across components
  8. And it isn’t just the challenge of data integration across the components – there are other types of complexity as well. Making the integrated data tier highly available, enabling it to achieve high scale and getting the data closer to the end-users. Addressing the different NoSQL data models – or as Matt described it “the multi-model” challenge Enterprises also face issues in trying to find developer talent that can do the integration – well and in ways that keep the operational costs in check – make it simple to operate (scale up/down, address data consistency..)
  9. Here are some the specific challenges our customers have shared with us around integrating with Redis for caching and Spark for in memory analytics.
  10. Mac Devine quote from the press release