Más contenido relacionado La actualidad más candente (20) Similar a Hadoop and NoSQL joining forces by Dale Kim of MapR (20) Hadoop and NoSQL joining forces by Dale Kim of MapR1. © 2014 MapR Techno©lo g2i0e1s4 MapR Technologies 1
Hadoop and NoSQL Joining Forces
2. © 2014 MapR Technologies 2
Topics
Big Data, Hadoop, and NoSQL
The In-Hadoop Advantage
NoSQL-on-Hadoop in Action
Other In-Hadoop Examples
Integrating with SQL
3. Big Data is Overwhelming Traditional Systems
© 2014 MapR Technologies 3
• Mission-critical reliability
• Transaction guarantees
• Deep security
• Real-time performance
• Backup and recovery
• Interactive SQL
• Rich analytics
• Workload management
• Data governance
• Backup and recovery
ENTERPRISE
USERS
Enterprise
Data
Architecture
OPERATIONAL
SYSTEMS
ANALYTICAL
SYSTEMS
PRODUCTION
REQUIREMENTS
PRODUCTION
REQUIREMENTS
OUTSIDE SOURCES
4. High
© 2014 MapR Technologies 4
Scaling on Traditional Technologies
Data volume, velocity
Scale up to bigger, faster machines
Data variety
Extensive data modeling and ETL
Low
Low High
5. Data volume, velocity
Low High
NoSQL NoSQL NoSQL
Data variety
Low High
© 2014 MapR Technologies 5
Scaling on Newer Technologies
Scale out with commodity hardware
Use the right tool for unstructured,
multi-structured, semi-structured,
non-relational data
6. Hadoop and NoSQL Relieve the Pressure from Enterprise Systems
Keys for Production Success
1 Reliability and DR
3 High performance
© 2014 MapR Technologies 6
OPERATIONAL
SYSTEMS
ANALYTICAL
SYSTEMS
ENTERPRISE
USERS
• Data staging
• Archive
• Data transformation
• Data exploration
• Streaming,
interactions
2 Interoperability
4
Supports operations
and analytics
+ NoSQL
7. © 2014 MapR Technologies 7
You Already Know;
• NoSQL is a class of databases that specialize in:
– Scale-out on commodity servers – no application-level sharding
– Flexible data models – no fixed schema required
• Hadoop is a distributed platform designed for:
– Storing/processing huge volumes of data cost-effectively
– Spreading work across many servers (“divide and conquer”)
Before we continue, let’s take a quick look back;
8. Google’s operational data store (BigTable) has enabled multiple revolutions
within the company:
© 2014 MapR Technologies 8
What Would (Did) Google Do?
2003
GFS
2004
Web index is batch
(GFS/MapReduce)
2010
Web index is real-time
(BigTable)
The transition from
batch to real-time
2004
MapReduce
2006
BigTable
The explosion in
operational applications
(1)
(2)
9. © 2014 MapR Technologies 9
Operations Vs. Analytics
Operations (Databases)
• Real-time
• Reads/writes/updates
• Current/recent data
• Updated regularly
• Fast inserts/updates
• Large volumes of data
Analytics (Hadoop)
• Batch
• Reports/Computations
• Historical data
• Generally non-volatile
• Fast retrievals
• Even larger volumes of data
But is the data different?
10. © 2014 MapR Technologies 10
Mobile
application server
Web
application server
Handling Multiple Workloads
Analytics Operational
Hadoop
Data exploration
(SQL)
Operational NoSQL
DBMS
Batch import/export
Customer 360
dashboard
Churn analysis
(predictive analytics)
11. © 2014 MapR Technologies 11
Mobile
application server
Product/service
optimization and
personalization
Data exploration
(SQL)
Customer 360
dashboard
Churn analysis
(predictive analytics)
• Single cluster
•High performance, low latency
• Large-scale analytics
• Enterprise-grade HA/DR
•Unified file and table administration
Real-time ad
targeting
Real-Time and Operational
Actionable
Analytics
Web
application server
In-Hadoop Databases
12. © 2014 MapR Technologies 12
Separate Clusters Versus Single Cluster
Separate Hadoop and Database
• Delays analyzing live data
• Network traffic
– Heavy bandwidth usage
– Heavy cleanup upon error
• Complexity
– Higher maintenance, risk of error
– More HA/DR administration
– Risk to SLAs
• Unnecessarily duplicated
resources
Consolidated Deployment
• Real-time analysis/computation
• Data locality
– Reduced bandwidth utilization
– Efficient divide-and-conquer analysis
• Architectural simplicity
– Lower risk of error
– Lower administrative overhead
• No unnecessary data/hardware
duplication (except for HA/DR)
13. Databases on Direct Attached Storage (DAS)
Advantages
• Fast local file access
• Lower cost vs. SAN/NAS
© 2014 MapR Technologies 13
14. Databases on Networked Storage (SAN/NAS)
Advantages
• Snapshot/backup
• Easy capacity expansion
• Disaster recovery
• Improved disk utilization
• Seamless maintenance
• Reliable
© 2014 MapR Technologies 14
15. © 2014 MapR Technologies 15
Databases on Hadoop (“In-Hadoop”)
Advantages
• Benefits of DAS
• Reduced complexity vs.
SAN
• Lower operational cost
• Faster local file access
• Easy capacity expansion
• Dynamic storage utilization
Hadoop
16. Lambda Architecture (lambda-architecture.net)
© 2014 MapR Technologies 16
BATCH VIEWS
BATCH LAYER
SERVING LAYER
SPEED LAYER
MERGE
ALL DATA
(HDFS)
HADOOP
BATCH
RECOMPUTE
PROCESS
STREAM
REAL-TIME VIEWS
INCREMENT
VIEWS
STORM
Partial
aggregate
REAL-TIME
INCREMENT
Partial
aggregate
Partial
aggregate
MERGED
VIEW
(HBASE)
REAL-TIME DATA
NEW DATA
STREAM
PRECOMPUTE
VIEWS
(MAPREDUCE)
17. © 2014 MapR Technologies 17
Enterprise Data Hub Architecture
Load more data
sources
Enrich data in Hadoop Analyze
Offload / Enrich /
Reload
RELATIONAL,
SAAS,
MAINFRAME
DOCUMENTS,
EMAILS
BLOGS,
TWEETS,
LINK DATA
LOG FILES,
CLICKSTREAMS
MapR Control System (MCS)
Hadoop User Experience (HUE)
Batch Processing
MR, YARN, Hive, Pig, etc.
Interactive Querying
Drill, Impala, Presto, etc.
HBase other data stores
MapR Data Platform
MapR-DB Tables
MAPR DISTRIBUTION INCLUDING HADOOP
BI REPORTS AND
APPLICATIONS
High
speed
streaming
DATA MARTS DATA WAREHOUSE
PARSE, PROFILE, ETL
LOAD
REPLICATE, CDC
STREAMING
CLEANSE, MATCH
LOAD
18. Customer data, network
security event data
Anomaly detection on
large volumes of security
event data, analytics on
customer data to enable
incremental sales
© 2014 MapR Technologies 18
19. Industry data analysis,
SaaS-based reporting
© 2014 MapR Technologies 19
Advertising
Automation
Cloud
Buyers
Cloud
Sales performance
management data
combined with fast
responsiveness SaaS-delivered
reports
20. Customer profile data,
customer behavior data
Analytics on customer
behavior for better
recommendations
© 2014 MapR Technologies 20
Telecommunications Company
21. © 2014 MapR Technologies 21
MapR Overview
BIG
DATA
BEST
PRODUCT
BUSINESS
IMPACT
Hadoop
Top Ranked
Production
Success
22. The Power of the Open Source Community
Provisioning
&
coordination
Savannah*
Workflow
& Data
Governance
Data
Integration
& Access
Hue
HttpFS
Flume Knox* Falcon*
© 2014 MapR Technologies 22
MMaannaaggeemmeenntt
APACHE HADOOP AND OSS ECOSYSTEM
Streaming
Storm*
NoSQL &
Search
Solr
MapR Data Platform
Security
SQL
Drill*
Shark
Impala
YARN
Batch
Spark
Cascading
Pig
Spark
Streaming
HBase
Juju
ML, Graph
GraphX
MLLib
Mahout
MapReduce
v1 & v2
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Tez*
Accumulo*
Hive
Sqoop Sentry* Oozie ZooKeeper
MapR-DB MapR-FS
* Certification/support planned for 2014
23. MapR-DB: Powerful NoSQL Integrated with Hadoop
Benefit Features
High Performance Over 1 million ops/sec with 10 nodes, in-memory processing
Continuous Low Latency No I/O storms, no compaction delays
© 2014 MapR Technologies
24x7 Applications
Instant recovery, online schema modification, snapshots,
mirroring
Consistency Strong data consistency, row-level ACID transactions
Simplified Database
Administration
No processes to manage, automated splits, self-tuning
High Scalability 1 trillion tables, trillions of rows, millions of columns
Low TCO Files and tables on one platform, more work with fewer nodes
Performance
Reliability
Easy
Administration
24. MapR-DB (in MapR Enterprise Database Edition)
© 2014 MapR Technologies 24
MapR-DB
NoSQL Table-Style Store
Apache HBase API
In-Hadoop Database
HBase
JVM
HDFS
JVM
ext3/ext4
Disks
Other Distros
Tables/Files
Disks
MapR
Fast, scalable, reliable.
HBase API, in-memory option, Hadoop integration.
25. © 2014 MapR Technologies
Consistent, Low Read Latency
--- MapR-DB Read Latency --- Other’s Read Latency
26. © 2014 MapR Technologies 26
Other In-Hadoop Database Technologies
• Databases in Hadoop
– Apache HBase
– Apache Accumulo
– Splice Machine
– MarkLogic
• Data Warehouses on Hadoop
– HP Vertica
– Pivotal HAWQ
27. © 2014 MapR Technologies 27
What Other Trends?
• SQL query engines
– Apache Drill
– Impala
– Presto
– Etc.
• In-memory processing
– GridGain
– Apache Spark
– HAMRTech
28. SQL Query Engines for Hadoop and NoSQL Together
© 2014 MapR Technologies 28
Impala
29. • Pioneering Data Agility for Hadoop
• Apache open source project
• Scale-out execution engine for low-latency queries
• Unified SQL-based API for analytics operational applications
© 2014 MapR Technologies 29
APACHE DRILL
Vibrant Community
40+ contributors
150+ years of experience building
databases and distributed systems
30. © 2014 MapR Technologies 30
Q A
Engage with us!
@mapr maprtech
dalekim@mapr.com
MapR
maprtech
mapr-technologies