This is a presentation by Peter Coppola, VP of Product and Marketing at Basho Technologies and Matthew Aslett, Research Director at 451 Research. Join them as they discuss whether multi-model databases and polyglot persistence have increased operational complexity. They'll discuss the benefits and importance of NoSQL databases and how the Basho Data Platform helps enterprises leverage Big Data applications.
2. Presenters:
Matthew Aslett,
Research Director, 451 Research
• NoSQL Beyond Polyglot
Persistence
Peter Coppola, VP Product &
Marketing, Basho Technologies
• How a Data Platform solves the
challenges of integrating NoSQL
into Big Data applications
4. 451 Research is an information
technology research & advisory company
Founded in 2000
210+ employees, including over 100 analysts
1,000+ clients: Technology & Service providers, corporate
advisory, finance, professional services, and IT decision makers
12,500+ senior IT professionals in our research community
Over 52 million data points each quarter
4,500+ reports published each year covering 2,000+
innovative technology & service providers
Headquartered in New York City with offices in London,
Boston, San Francisco, and Washington D.C.
451 Research and its sister company Uptime Institute
comprise the two divisions of The 451 Group
Research & Data
Advisory Services
Events
4
Copyright (C) 2015 451 Research LLC
5. The birth of NoSQL
• The genesis of much – although by no means all – of the momentum behind
the NoSQL database movement can be attributed to two research papers:
• Google’s BigTable: A Distributed Storage System for Structured Data, presented at the
Seventh Symposium on Operating System Design and Implementation, in November
2006
• Amazon’s Dynamo: Amazon’s Highly Available Key-Value Store, presented at the 21st
ACM Symposium on Operating Systems Principles, in October 2007
• The term itself was coined by Johan Oskarsson as the name for a June 2009
meeting of developers, users and others interested in a group of loosely
related data technologies
5
7. The traditional relational database has been stretched beyond its normal
capacity by the needs of high-volume, highly distributed or highly complex
applications.
Scalability
Performance
Relaxed consistency Increased willingness to look towards
Agility emerging alternatives
Intricacy
Necessity
Database SPRAIN
7
8. The traditional relational database has been stretched beyond its normal
capacity by the needs of high-volume, highly distributed or highly complex
applications.
Scalability
Performance
Relaxed consistency A diverse array of NoSQL projects
Agility serving a range of use-cases
Intricacy
Necessity
Database SPRAIN
8
11. The idea that different data storage models have their own strengths and
should be used in combination to solve the various data processing
needs of a complex application.
Polyglot persistence
Wide-column
Data is mapped by
a row key, column
key and time
stamp.
Key Value
Store keys and
associated values.
Graph
Store data and the
relationships
between data.
Document
Store all data
related to a
specific key as a
single document.
DATA MODEL COMPLEXITY
11
19. Delivering on a Data Platform
Peter Coppola
VP, Product & Marketing
20. THE EVOLUTION OF NOSQL
Unstructured
Data Platforms
Multi-Model
Solutions
Point
Solutions
Basho Technologies | 20
CONFIDENTIAL
21. 42% of database decision makers admit they
struggle to manage the NoSQL solutions
deployed in their environments”
COMPLEX TECHNOLOGY STACK
Riak
Spark
Basho Technologies | 21
22. OUR CUSTOMERS ARE INTEGRATING
NoSQL, Caching, Real-time Analytics and Search
Basho Technologies | 22
23. Big data, hybrid cloud
architectures and IoT
require developers to
integrate, replicate and
synchronize information
across functions
Mac Devine
Vice President and CTO
IBM Cloud Services
24. Enterprises building Big Data, IoT and Hybrid Cloud
applications are struggling with complexity
Distributed workload challenges:
availability, scale and geo-location
Proliferation of data models: Key-Value,
In-Memory, Document, etc.
High costs to ensure data accuracy:
replication, synchronization and
integration
High operational costs: architectural
and management simplicity &
efficiency
Lack of available developer expertise
Big Data
Hybrid
Cloud
IoT
Database(s)
Storage
Caches Analytics Queues Search
Log
Mgmt.
25. Current Operational Challenges
• Managing separate clusters for
Riak KV, Redis and Spark
• Manually synchronizing data
across the applications
• Using Zookeeper for Spark
cluster management
• Manually sharding data in
Redis
• Manually managing failures of
Redis instances
Customers manually integrating
Big data applications like
ours need to integrate
and then deploy many
different technology
components
Martin Davies
CEO of Technology
26. BASHO DATA PLATFORM
Basho Technologies | 26
SERVICE
INSTANCES
STORAGE
INSTANCES
Solr
Spark
Redis
(Caching)
Solr
Elastic
Search
Web Services
3rd Party Web
Services &
Integrations
Riak
Key/Value
Riak Object
Storage
Riak
Coming Soon
Document
Store
Columnar Graph
Replication &
Synchronization
Message
Routing
Cluster
Management &
Monitoring
Logging &
Analytics
Internal Data
Store
CORE SERVICES
27. CONFIDENTIAL
BASHO DATA PLATFORM
Data Replication and Synchronization
Replicate and synchronize data across and between storage instances and service instances to ensure data accuracy with
no data loss and high availability.
Cluster Management
Integrated cluster management automates deployment and configuration of Riak KV, Riak S2, Spark and Redis. Once
deployed in production, auto-detect issues and restart Redis instances or Spark clusters. Cluster management eliminates
the need for Zookeeper.
Internal Data Store
A built-in, distributed data store for ensuring speed, fault-tolerance and ease-of-operations is used to persist static and
dynamic configuration data (port number and IP address) across the Basho Data Platform.
Message Routing
A high-throughput, distributed message system for speed, scalability and high availability. This message system will have
the ability to persist and route messages across platform clusters.
Logging and Analytics
Event logs provide valuable information that can facilitate the enhanced tuning of clusters and accurately analyze dataflow
across the cluster
Core Services
28. BASHO DATA PLATFORM: SERVICE INSTANCES
Apache Spark Add-On
Zookeeper not required
Real-Time Analytics
• Move data from Riak KV
to Spark for batch and
real-time analytics and
store results back in Riak
KV for future processing
• Cluster management
eliminates the need for
Zookeeper
Redis Add-On
Availability w/ auto-sharding
Integrated Caching
• Redis is now Enterprise
grade with high
availability, data
synchronization with
Riak KV and cluster
management
• Automatic data
sharding across
multiple cache servers
simplifies operations
Apache Solr Add-On
Query like Solr
Enriched Search
• Powerful full-text search
of Solr with the
availability and
scalability of Riak KV
• As data changes,
search indexes are
automatically
synchronized
29. BASHO DIFFERENCE
• Ease of Scale
• Optimized for High Availability
• Data Correctness
• Solving data distribution
challenge
• Operational Simplicity
Basho Technologies | 29
CONFIDENTIAL
We are excited that
Basho is stepping
forward and simplifying
our daunting technology
stack
Jason Ordway
CTO
specialized databases enabled users to store and process data using nonrelational models, including the key value, wide column, document and graph data models
An argument against using specific databases for specific workload requirements is that it encourages the use of multiple databases to support an individual application, leading to operational complexity and inflexibility driven by interdependence.
This complexity is not only driven by the interdependence of multiple databases but also by the components such as caching, search and analytics that are truly needed to power these applications.
Multi-model: enable the flexibility of polyglot persistence without the operational complexity by supporting multiple data models without multiple databases
Thank Matt – for the next 15 mins or so focus on what customers are telling us about the challenges they face in integrating NoSQL into their big data applications and how a data platform can help address the issues they face
In talking with customers we see 3 levels of NoSQL adoption
Here is an example of the technology stack for one of our customers. You see databases, in memory analytics, caching. message queues, web proxies, distributed configuration service and more
TechValidate survey of our install base. You see a high percentage of our customers with enterprise search (both Solr and ES). Lots of customers using Redis (usually as a cache) and message queues or pub/sub msg brokers like RabbitMQ and Kafka.
While there is low deployment of Spark – today – we know there is very high activity around trials, testing and planned projects for Spark as an analytics platform with our key/value store and other data sources
One of the big challenges with integrating databases with other components of the data tier like caching, etc is the data distribution challenge – or as Mac Devine puts it having to integrate, replicate and synchronize data across components
And it isn’t just the challenge of data integration across the components – there are other types of complexity as well.
Making the integrated data tier highly available, enabling it to achieve high scale and getting the data closer to the end-users.
Addressing the different NoSQL data models – or as Matt described it “the multi-model” challenge
Enterprises also face issues in trying to find developer talent that can do the integration – well and in ways that keep the operational costs in check – make it simple to operate (scale up/down, address data consistency..)
Here are some the specific challenges our customers have shared with us around integrating with Redis for caching and Spark for in memory analytics.