SlideShare una empresa de Scribd logo
1 de 30
© 2014 MapR Techno©lo g2i0e1s4 MapR Technologies 1 
Hadoop and NoSQL Joining Forces
© 2014 MapR Technologies 2 
Topics 
Big Data, Hadoop, and NoSQL 
The In-Hadoop Advantage 
NoSQL-on-Hadoop in Action 
Other In-Hadoop Examples 
Integrating with SQL
Big Data is Overwhelming Traditional Systems 
© 2014 MapR Technologies 3 
• Mission-critical reliability 
• Transaction guarantees 
• Deep security 
• Real-time performance 
• Backup and recovery 
• Interactive SQL 
• Rich analytics 
• Workload management 
• Data governance 
• Backup and recovery 
ENTERPRISE 
USERS 
Enterprise 
Data 
Architecture 
OPERATIONAL 
SYSTEMS 
ANALYTICAL 
SYSTEMS 
PRODUCTION 
REQUIREMENTS 
PRODUCTION 
REQUIREMENTS 
OUTSIDE SOURCES
High 
© 2014 MapR Technologies 4 
Scaling on Traditional Technologies 
Data volume, velocity 
Scale up to bigger, faster machines 
Data variety 
Extensive data modeling and ETL 
Low 
Low High
Data volume, velocity 
Low High 
NoSQL NoSQL NoSQL 
Data variety 
Low High 
© 2014 MapR Technologies 5 
Scaling on Newer Technologies 
Scale out with commodity hardware 
Use the right tool for unstructured, 
multi-structured, semi-structured, 
non-relational data
Hadoop and NoSQL Relieve the Pressure from Enterprise Systems 
Keys for Production Success 
1 Reliability and DR 
3 High performance 
© 2014 MapR Technologies 6 
OPERATIONAL 
SYSTEMS 
ANALYTICAL 
SYSTEMS 
ENTERPRISE 
USERS 
• Data staging 
• Archive 
• Data transformation 
• Data exploration 
• Streaming, 
interactions 
2 Interoperability 
4 
Supports operations 
and analytics 
+ NoSQL
© 2014 MapR Technologies 7 
You Already Know; 
• NoSQL is a class of databases that specialize in: 
– Scale-out on commodity servers – no application-level sharding 
– Flexible data models – no fixed schema required 
• Hadoop is a distributed platform designed for: 
– Storing/processing huge volumes of data cost-effectively 
– Spreading work across many servers (“divide and conquer”) 
Before we continue, let’s take a quick look back;
Google’s operational data store (BigTable) has enabled multiple revolutions 
within the company: 
© 2014 MapR Technologies 8 
What Would (Did) Google Do? 
2003 
GFS 
2004 
Web index is batch 
(GFS/MapReduce) 
2010 
Web index is real-time 
(BigTable) 
The transition from 
batch to real-time 
2004 
MapReduce 
2006 
BigTable 
The explosion in 
operational applications 
(1) 
(2)
© 2014 MapR Technologies 9 
Operations Vs. Analytics 
Operations (Databases) 
• Real-time 
• Reads/writes/updates 
• Current/recent data 
• Updated regularly 
• Fast inserts/updates 
• Large volumes of data 
Analytics (Hadoop) 
• Batch 
• Reports/Computations 
• Historical data 
• Generally non-volatile 
• Fast retrievals 
• Even larger volumes of data 
But is the data different?
© 2014 MapR Technologies 10 
Mobile 
application server 
Web 
application server 
Handling Multiple Workloads 
Analytics Operational 
Hadoop 
Data exploration 
(SQL) 
Operational NoSQL 
DBMS 
Batch import/export 
Customer 360 
dashboard 
Churn analysis 
(predictive analytics)
© 2014 MapR Technologies 11 
Mobile 
application server 
Product/service 
optimization and 
personalization 
Data exploration 
(SQL) 
Customer 360 
dashboard 
Churn analysis 
(predictive analytics) 
• Single cluster 
•High performance, low latency 
• Large-scale analytics 
• Enterprise-grade HA/DR 
•Unified file and table administration 
Real-time ad 
targeting 
Real-Time and Operational 
Actionable 
Analytics 
Web 
application server 
In-Hadoop Databases
© 2014 MapR Technologies 12 
Separate Clusters Versus Single Cluster 
Separate Hadoop and Database 
• Delays analyzing live data 
• Network traffic 
– Heavy bandwidth usage 
– Heavy cleanup upon error 
• Complexity 
– Higher maintenance, risk of error 
– More HA/DR administration 
– Risk to SLAs 
• Unnecessarily duplicated 
resources 
Consolidated Deployment 
• Real-time analysis/computation 
• Data locality 
– Reduced bandwidth utilization 
– Efficient divide-and-conquer analysis 
• Architectural simplicity 
– Lower risk of error 
– Lower administrative overhead 
• No unnecessary data/hardware 
duplication (except for HA/DR)
Databases on Direct Attached Storage (DAS) 
Advantages 
• Fast local file access 
• Lower cost vs. SAN/NAS 
© 2014 MapR Technologies 13
Databases on Networked Storage (SAN/NAS) 
Advantages 
• Snapshot/backup 
• Easy capacity expansion 
• Disaster recovery 
• Improved disk utilization 
• Seamless maintenance 
• Reliable 
© 2014 MapR Technologies 14
© 2014 MapR Technologies 15 
Databases on Hadoop (“In-Hadoop”) 
Advantages 
• Benefits of DAS 
• Reduced complexity vs. 
SAN 
• Lower operational cost 
• Faster local file access 
• Easy capacity expansion 
• Dynamic storage utilization 
Hadoop
Lambda Architecture (lambda-architecture.net) 
© 2014 MapR Technologies 16 
BATCH VIEWS 
BATCH LAYER 
SERVING LAYER 
SPEED LAYER 
MERGE 
ALL DATA 
(HDFS) 
HADOOP 
BATCH 
RECOMPUTE 
PROCESS 
STREAM 
REAL-TIME VIEWS 
INCREMENT 
VIEWS 
STORM 
Partial 
aggregate 
REAL-TIME 
INCREMENT 
Partial 
aggregate 
Partial 
aggregate 
MERGED 
VIEW 
(HBASE) 
REAL-TIME DATA 
NEW DATA 
STREAM 
PRECOMPUTE 
VIEWS 
(MAPREDUCE)
© 2014 MapR Technologies 17 
Enterprise Data Hub Architecture 
Load more data 
sources 
Enrich data in Hadoop Analyze 
Offload / Enrich / 
Reload 
RELATIONAL, 
SAAS, 
MAINFRAME 
DOCUMENTS, 
EMAILS 
BLOGS, 
TWEETS, 
LINK DATA 
LOG FILES, 
CLICKSTREAMS 
MapR Control System (MCS) 
Hadoop User Experience (HUE) 
Batch Processing 
MR, YARN, Hive, Pig, etc. 
Interactive Querying 
Drill, Impala, Presto, etc. 
HBase other data stores 
MapR Data Platform 
MapR-DB Tables 
MAPR DISTRIBUTION INCLUDING HADOOP 
BI REPORTS AND 
APPLICATIONS 
High 
speed 
streaming 
DATA MARTS DATA WAREHOUSE 
PARSE, PROFILE, ETL 
LOAD 
REPLICATE, CDC 
STREAMING 
CLEANSE, MATCH 
LOAD
Customer data, network 
security event data 
Anomaly detection on 
large volumes of security 
event data, analytics on 
customer data to enable 
incremental sales 
© 2014 MapR Technologies 18
Industry data analysis, 
SaaS-based reporting 
© 2014 MapR Technologies 19 
Advertising 
Automation 
Cloud 
Buyers 
Cloud 
Sales performance 
management data 
combined with fast 
responsiveness SaaS-delivered 
reports
Customer profile data, 
customer behavior data 
Analytics on customer 
behavior for better 
recommendations 
© 2014 MapR Technologies 20 
Telecommunications Company
© 2014 MapR Technologies 21 
MapR Overview 
BIG 
DATA 
BEST 
PRODUCT 
BUSINESS 
IMPACT 
Hadoop 
Top Ranked 
Production 
Success
The Power of the Open Source Community 
Provisioning 
& 
coordination 
Savannah* 
Workflow 
& Data 
Governance 
Data 
Integration 
& Access 
Hue 
HttpFS 
Flume Knox* Falcon* 
© 2014 MapR Technologies 22 
MMaannaaggeemmeenntt 
APACHE HADOOP AND OSS ECOSYSTEM 
Streaming 
Storm* 
NoSQL & 
Search 
Solr 
MapR Data Platform 
Security 
SQL 
Drill* 
Shark 
Impala 
YARN 
Batch 
Spark 
Cascading 
Pig 
Spark 
Streaming 
HBase 
Juju 
ML, Graph 
GraphX 
MLLib 
Mahout 
MapReduce 
v1 & v2 
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS 
Tez* 
Accumulo* 
Hive 
Sqoop Sentry* Oozie ZooKeeper 
MapR-DB MapR-FS 
* Certification/support planned for 2014
MapR-DB: Powerful NoSQL Integrated with Hadoop 
Benefit Features 
High Performance Over 1 million ops/sec with 10 nodes, in-memory processing 
Continuous Low Latency No I/O storms, no compaction delays 
© 2014 MapR Technologies 
24x7 Applications 
Instant recovery, online schema modification, snapshots, 
mirroring 
Consistency Strong data consistency, row-level ACID transactions 
Simplified Database 
Administration 
No processes to manage, automated splits, self-tuning 
High Scalability 1 trillion tables, trillions of rows, millions of columns 
Low TCO Files and tables on one platform, more work with fewer nodes 
Performance 
Reliability 
Easy 
Administration
MapR-DB (in MapR Enterprise Database Edition) 
© 2014 MapR Technologies 24 
MapR-DB 
 NoSQL Table-Style Store 
 Apache HBase API 
 In-Hadoop Database 
HBase 
JVM 
HDFS 
JVM 
ext3/ext4 
Disks 
Other Distros 
Tables/Files 
Disks 
MapR 
Fast, scalable, reliable. 
HBase API, in-memory option, Hadoop integration.
© 2014 MapR Technologies 
Consistent, Low Read Latency 
--- MapR-DB Read Latency --- Other’s Read Latency
© 2014 MapR Technologies 26 
Other In-Hadoop Database Technologies 
• Databases in Hadoop 
– Apache HBase 
– Apache Accumulo 
– Splice Machine 
– MarkLogic 
• Data Warehouses on Hadoop 
– HP Vertica 
– Pivotal HAWQ
© 2014 MapR Technologies 27 
What Other Trends? 
• SQL query engines 
– Apache Drill 
– Impala 
– Presto 
– Etc. 
• In-memory processing 
– GridGain 
– Apache Spark 
– HAMRTech
SQL Query Engines for Hadoop and NoSQL Together 
© 2014 MapR Technologies 28 
Impala
• Pioneering Data Agility for Hadoop 
• Apache open source project 
• Scale-out execution engine for low-latency queries 
• Unified SQL-based API for analytics  operational applications 
© 2014 MapR Technologies 29 
APACHE DRILL 
Vibrant Community 
40+ contributors 
150+ years of experience building 
databases and distributed systems
© 2014 MapR Technologies 30 
Q  A 
Engage with us! 
@mapr maprtech 
dalekim@mapr.com 
MapR 
maprtech 
mapr-technologies

Más contenido relacionado

La actualidad más candente

Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
Brock Noland
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
 

La actualidad más candente (20)

Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
 
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
 
Real-time Data Pipelines with SAP and Apache Kafka
Real-time Data Pipelines with SAP and Apache KafkaReal-time Data Pipelines with SAP and Apache Kafka
Real-time Data Pipelines with SAP and Apache Kafka
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
IBM Power8 announce
IBM Power8 announceIBM Power8 announce
IBM Power8 announce
 
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-HadoopHP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
How Experian increased insights with Hadoop
How Experian increased insights with HadoopHow Experian increased insights with Hadoop
How Experian increased insights with Hadoop
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Big Data Education Webcast: Introducing DMX and DMX-h Release 8
Big Data Education Webcast: Introducing DMX and DMX-h Release 8Big Data Education Webcast: Introducing DMX and DMX-h Release 8
Big Data Education Webcast: Introducing DMX and DMX-h Release 8
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 

Destacado

Destacado (20)

140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsieh140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsieh
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky
 
Ag big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopalAg big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopal
 
Big datacamp june14_alex_liu
Big datacamp june14_alex_liuBig datacamp june14_alex_liu
Big datacamp june14_alex_liu
 
Summit v4 dave wolcott
Summit v4 dave wolcottSummit v4 dave wolcott
Summit v4 dave wolcott
 
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
 
20140614 introduction to spark-ben white
20140614 introduction to spark-ben white20140614 introduction to spark-ben white
20140614 introduction to spark-ben white
 
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
 
Yarn cloudera-kathleenting061414 kate-ting
Yarn cloudera-kathleenting061414 kate-tingYarn cloudera-kathleenting061414 kate-ting
Yarn cloudera-kathleenting061414 kate-ting
 
Aziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaAziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jha
 
Kiji cassandra la june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kellyKiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la june 2014 - v02 clint-kelly
 
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
 
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014
 
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
 
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
 
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
 

Similar a Hadoop and NoSQL joining forces by Dale Kim of MapR

Delivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated ArchitectureDelivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated Architecture
DataWorks Summit
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseHadoop is not an Island in the Enterprise
Hadoop is not an Island in the Enterprise
DataWorks Summit
 

Similar a Hadoop and NoSQL joining forces by Dale Kim of MapR (20)

Delivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated ArchitectureDelivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated Architecture
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Integrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environmentIntegrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environment
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Meruvian - Introduction to MapR
Meruvian - Introduction to MapRMeruvian - Introduction to MapR
Meruvian - Introduction to MapR
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop Solution
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseHadoop is not an Island in the Enterprise
Hadoop is not an Island in the Enterprise
 
Self-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache DrillSelf-Service Data Exploration with Apache Drill
Self-Service Data Exploration with Apache Drill
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Cisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR DistributionCisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR Distribution
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big Data
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 

Más de Data Con LA

Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 

Más de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Hadoop and NoSQL joining forces by Dale Kim of MapR

  • 1. © 2014 MapR Techno©lo g2i0e1s4 MapR Technologies 1 Hadoop and NoSQL Joining Forces
  • 2. © 2014 MapR Technologies 2 Topics Big Data, Hadoop, and NoSQL The In-Hadoop Advantage NoSQL-on-Hadoop in Action Other In-Hadoop Examples Integrating with SQL
  • 3. Big Data is Overwhelming Traditional Systems © 2014 MapR Technologies 3 • Mission-critical reliability • Transaction guarantees • Deep security • Real-time performance • Backup and recovery • Interactive SQL • Rich analytics • Workload management • Data governance • Backup and recovery ENTERPRISE USERS Enterprise Data Architecture OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS PRODUCTION REQUIREMENTS PRODUCTION REQUIREMENTS OUTSIDE SOURCES
  • 4. High © 2014 MapR Technologies 4 Scaling on Traditional Technologies Data volume, velocity Scale up to bigger, faster machines Data variety Extensive data modeling and ETL Low Low High
  • 5. Data volume, velocity Low High NoSQL NoSQL NoSQL Data variety Low High © 2014 MapR Technologies 5 Scaling on Newer Technologies Scale out with commodity hardware Use the right tool for unstructured, multi-structured, semi-structured, non-relational data
  • 6. Hadoop and NoSQL Relieve the Pressure from Enterprise Systems Keys for Production Success 1 Reliability and DR 3 High performance © 2014 MapR Technologies 6 OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS ENTERPRISE USERS • Data staging • Archive • Data transformation • Data exploration • Streaming, interactions 2 Interoperability 4 Supports operations and analytics + NoSQL
  • 7. © 2014 MapR Technologies 7 You Already Know; • NoSQL is a class of databases that specialize in: – Scale-out on commodity servers – no application-level sharding – Flexible data models – no fixed schema required • Hadoop is a distributed platform designed for: – Storing/processing huge volumes of data cost-effectively – Spreading work across many servers (“divide and conquer”) Before we continue, let’s take a quick look back;
  • 8. Google’s operational data store (BigTable) has enabled multiple revolutions within the company: © 2014 MapR Technologies 8 What Would (Did) Google Do? 2003 GFS 2004 Web index is batch (GFS/MapReduce) 2010 Web index is real-time (BigTable) The transition from batch to real-time 2004 MapReduce 2006 BigTable The explosion in operational applications (1) (2)
  • 9. © 2014 MapR Technologies 9 Operations Vs. Analytics Operations (Databases) • Real-time • Reads/writes/updates • Current/recent data • Updated regularly • Fast inserts/updates • Large volumes of data Analytics (Hadoop) • Batch • Reports/Computations • Historical data • Generally non-volatile • Fast retrievals • Even larger volumes of data But is the data different?
  • 10. © 2014 MapR Technologies 10 Mobile application server Web application server Handling Multiple Workloads Analytics Operational Hadoop Data exploration (SQL) Operational NoSQL DBMS Batch import/export Customer 360 dashboard Churn analysis (predictive analytics)
  • 11. © 2014 MapR Technologies 11 Mobile application server Product/service optimization and personalization Data exploration (SQL) Customer 360 dashboard Churn analysis (predictive analytics) • Single cluster •High performance, low latency • Large-scale analytics • Enterprise-grade HA/DR •Unified file and table administration Real-time ad targeting Real-Time and Operational Actionable Analytics Web application server In-Hadoop Databases
  • 12. © 2014 MapR Technologies 12 Separate Clusters Versus Single Cluster Separate Hadoop and Database • Delays analyzing live data • Network traffic – Heavy bandwidth usage – Heavy cleanup upon error • Complexity – Higher maintenance, risk of error – More HA/DR administration – Risk to SLAs • Unnecessarily duplicated resources Consolidated Deployment • Real-time analysis/computation • Data locality – Reduced bandwidth utilization – Efficient divide-and-conquer analysis • Architectural simplicity – Lower risk of error – Lower administrative overhead • No unnecessary data/hardware duplication (except for HA/DR)
  • 13. Databases on Direct Attached Storage (DAS) Advantages • Fast local file access • Lower cost vs. SAN/NAS © 2014 MapR Technologies 13
  • 14. Databases on Networked Storage (SAN/NAS) Advantages • Snapshot/backup • Easy capacity expansion • Disaster recovery • Improved disk utilization • Seamless maintenance • Reliable © 2014 MapR Technologies 14
  • 15. © 2014 MapR Technologies 15 Databases on Hadoop (“In-Hadoop”) Advantages • Benefits of DAS • Reduced complexity vs. SAN • Lower operational cost • Faster local file access • Easy capacity expansion • Dynamic storage utilization Hadoop
  • 16. Lambda Architecture (lambda-architecture.net) © 2014 MapR Technologies 16 BATCH VIEWS BATCH LAYER SERVING LAYER SPEED LAYER MERGE ALL DATA (HDFS) HADOOP BATCH RECOMPUTE PROCESS STREAM REAL-TIME VIEWS INCREMENT VIEWS STORM Partial aggregate REAL-TIME INCREMENT Partial aggregate Partial aggregate MERGED VIEW (HBASE) REAL-TIME DATA NEW DATA STREAM PRECOMPUTE VIEWS (MAPREDUCE)
  • 17. © 2014 MapR Technologies 17 Enterprise Data Hub Architecture Load more data sources Enrich data in Hadoop Analyze Offload / Enrich / Reload RELATIONAL, SAAS, MAINFRAME DOCUMENTS, EMAILS BLOGS, TWEETS, LINK DATA LOG FILES, CLICKSTREAMS MapR Control System (MCS) Hadoop User Experience (HUE) Batch Processing MR, YARN, Hive, Pig, etc. Interactive Querying Drill, Impala, Presto, etc. HBase other data stores MapR Data Platform MapR-DB Tables MAPR DISTRIBUTION INCLUDING HADOOP BI REPORTS AND APPLICATIONS High speed streaming DATA MARTS DATA WAREHOUSE PARSE, PROFILE, ETL LOAD REPLICATE, CDC STREAMING CLEANSE, MATCH LOAD
  • 18. Customer data, network security event data Anomaly detection on large volumes of security event data, analytics on customer data to enable incremental sales © 2014 MapR Technologies 18
  • 19. Industry data analysis, SaaS-based reporting © 2014 MapR Technologies 19 Advertising Automation Cloud Buyers Cloud Sales performance management data combined with fast responsiveness SaaS-delivered reports
  • 20. Customer profile data, customer behavior data Analytics on customer behavior for better recommendations © 2014 MapR Technologies 20 Telecommunications Company
  • 21. © 2014 MapR Technologies 21 MapR Overview BIG DATA BEST PRODUCT BUSINESS IMPACT Hadoop Top Ranked Production Success
  • 22. The Power of the Open Source Community Provisioning & coordination Savannah* Workflow & Data Governance Data Integration & Access Hue HttpFS Flume Knox* Falcon* © 2014 MapR Technologies 22 MMaannaaggeemmeenntt APACHE HADOOP AND OSS ECOSYSTEM Streaming Storm* NoSQL & Search Solr MapR Data Platform Security SQL Drill* Shark Impala YARN Batch Spark Cascading Pig Spark Streaming HBase Juju ML, Graph GraphX MLLib Mahout MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Tez* Accumulo* Hive Sqoop Sentry* Oozie ZooKeeper MapR-DB MapR-FS * Certification/support planned for 2014
  • 23. MapR-DB: Powerful NoSQL Integrated with Hadoop Benefit Features High Performance Over 1 million ops/sec with 10 nodes, in-memory processing Continuous Low Latency No I/O storms, no compaction delays © 2014 MapR Technologies 24x7 Applications Instant recovery, online schema modification, snapshots, mirroring Consistency Strong data consistency, row-level ACID transactions Simplified Database Administration No processes to manage, automated splits, self-tuning High Scalability 1 trillion tables, trillions of rows, millions of columns Low TCO Files and tables on one platform, more work with fewer nodes Performance Reliability Easy Administration
  • 24. MapR-DB (in MapR Enterprise Database Edition) © 2014 MapR Technologies 24 MapR-DB NoSQL Table-Style Store Apache HBase API In-Hadoop Database HBase JVM HDFS JVM ext3/ext4 Disks Other Distros Tables/Files Disks MapR Fast, scalable, reliable. HBase API, in-memory option, Hadoop integration.
  • 25. © 2014 MapR Technologies Consistent, Low Read Latency --- MapR-DB Read Latency --- Other’s Read Latency
  • 26. © 2014 MapR Technologies 26 Other In-Hadoop Database Technologies • Databases in Hadoop – Apache HBase – Apache Accumulo – Splice Machine – MarkLogic • Data Warehouses on Hadoop – HP Vertica – Pivotal HAWQ
  • 27. © 2014 MapR Technologies 27 What Other Trends? • SQL query engines – Apache Drill – Impala – Presto – Etc. • In-memory processing – GridGain – Apache Spark – HAMRTech
  • 28. SQL Query Engines for Hadoop and NoSQL Together © 2014 MapR Technologies 28 Impala
  • 29. • Pioneering Data Agility for Hadoop • Apache open source project • Scale-out execution engine for low-latency queries • Unified SQL-based API for analytics operational applications © 2014 MapR Technologies 29 APACHE DRILL Vibrant Community 40+ contributors 150+ years of experience building databases and distributed systems
  • 30. © 2014 MapR Technologies 30 Q A Engage with us! @mapr maprtech dalekim@mapr.com MapR maprtech mapr-technologies