SlideShare a Scribd company logo
1 of 16
1© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
EMC Hadoop Storage Strategy
Ed Walsh - @vEddieW
Jim Ruddy - @Darth_Ruddy
Dan Baskette - @dbbaskette
2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
CHANGES IN ANALYTICS
DATA
VOLUME
DATA
VELOCITY
DATA
TYPES APPS
3© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
DATA LAKE TECHNOLOGY
HADOOP DISTRIBUTED FILE SYSTEM
• Highly saleable & portable
- Apache Open Source Specification
• Structured and unstructured data
• Analytics API interface standard
• Storage hardware flexibility
• Performance optimized for large file access
HDFS TRADE-OFFS
• Optimized for streaming writes; poor for random seeks
• Write once file system
• Hardware failure results in reduced performance
• Specialized file system, not designed for general use
HDFS Architecture
Client
NameNode
Secondary
NameNode
(Now called
checkpoint or
backup node)
Where do
I read or
write data?
Just
these
nodes
DataNode
DataNode
DataNode
Data
Status
HDFS Data
4© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
DATA LAKE TECHNOLOGY
HADOOP TIER
DataNode
HDFS
DataNode
HDFS
DataNode
HDFS
DataNode
HDFS
DataNode
HDFS
DataNode
HDFS
DataNode
HDFS
PROCESSING TIER – ME, HIVE, ETC.
DEEP SCALE SQL ANALYTICS – PIVOTAL HAWQ
IN MEMORY TIER
SQL OBJECTS JSON
DATABASES
Operational data is the focus
(it is in memory, mostly)
Continue to work
with RDBs
All data, history
in HDFS
HDFS data files directly
accessible inside Hadoop
Analytic results routed to
memory tier
5© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
DATA LAKE STORAGE FEATURES
NO SILOS
Multi-protocol access
Simultaneous access for
unstructured data
Separation of storage
from access protocol
OPTMIZED COST
Choice of storage hardware
Multi-vendor, no lock-in
LIMITLESS SCALE
Expand capacity as needed
Massive scale-out
Highly available
6© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
INTEGRATED HDFS WITH HADOOP DISTRO
STRENGTHS
• Tightly coupled with
Hadoop software
• Low cost
• Storage hardware
choice
• Integrated software
support
• Data locality
7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
HDFS STORAGE ARRAY INTERFACE
STRENGTHS
• No Ingest necessary
• NameNode Fault Tolerance
• Eliminate 3x mirroring
• Multi-protocol access
• Simultaneous Multi-Hadoop
distribution support
• Smart-Dedupe for Hadoop
• SEC 17a-4 Compliance
• Kerberos Authentication
• Application Multi-tenancy
EMC Hadoop Starter Kit: https://community.emc.com/docs/DOC-26892
8© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
HDFS BY STORAGE VIRTUALIZATION SOFTWARE
STRENGTHS
• Multi-protocol access
- Object, HDFS, Block
(iSCSI), more coming
- Write file, read object &
vice versa
• NameNode Fault Tolerance
• Eliminate 3x mirroring
• Compute & data locality
• Application multi-tenancy
• Heterogeneous Storage:
- Pool server storage
- Enterprise arrays
• EMC, Netapp, Hitachi
EMC Hadoop Starter Kit: https://community.emc.com/docs/DOC-34442
9© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
ANALYTICS APPLIANCES
STRENGTHS
• Rapid deployment
• Predictable performance & scale
• Optimum resource utilization
• Integrated, simplified
management
• Simplified support & maintenance
• Optimized cost
• Highest Reliability, Availability, and
Stability
10© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Traditional Analytics Architecture
RMT
Historian
IMAS
Alarm
LIMS
Oracle
BI
(SSRS,
Panopticon, Web)
Analytics
Server
(SAS)
Analytics
Server
(R)
Pre-
aggregated
Tables
BI
(Cognos)
11© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Modern Analytics Architecture
EMC Data Lake Architecture
RMT
Historian
IMAS
Alarm
LIMS
BI
Server
(SSRS,
Panopticon,
Web)
Analytics
Server
(SAS)
Analytics
Server
(R)
Historian
Alpine/Chor
us
(Pivotal)
“Real
Time”
Feed
BI
Server
(Tableau or
other)
Reporting
DB
GemFire XD HAWQ
HDFS
12© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
MODERN ANALYTICS USING DATA LAKE
DEMO
13© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
EMC DATA LAKE CAPABILITIEs
Documents (XLS,
PPT, DOC)
SQL
Databases
Rich Media
(PDF, JPG, Video,
Streaming)
Sensor Data
(GPS coordinates,
temperature
measurements)
Unstructured
Context (Web
Server Logs,
Scale Effortlessly | Store Efficiently | Access Globally
Ed Walsh - @vEddieW
Jim Ruddy - @Darth_Ruddy
Dan Baskette - @dbbaskette
16© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
The Emerging Data Platform Ecosystem
Business Data Lake
Ingestion Tier
Real-time
Batch
Micro batch
Data Sources
Clickstream
Sensors
Telemetrics
Weblogs
Network Data
CRM
ERP Data
Collab}
Insights Tier
SQL
MapReduce
NoSQL
Spark
R
Action Tier
Real-time
Insights
Batch
Insights
Interactive
Insights
Operations Tier
Data Services Tier
Processing Tier
MDM
RDM
Audit and
Policy mgmt
Data mgmt
services
Systems monitoring and management
Relational
Database
MPP
Database
In-memory
processing
Workflow Management
Hadoop App Server
Web
Services
Data Management Tier
HDFS
Software-
defined Storage
Enterprise
SAN/NAS
Public Cloud Hybrid Cloud Private Cloud
Infrastructure Tier
17© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Business Data Lake
EMC Federation Solutions
Data Sources Ingestion Tier
Clickstream
Sensors
Telemetrics
Real-time
Weblogs
Network Data
Batch
CRM
ERP Data
Collab
Micro batch
}
Operations Tier
Data Services Tier
Processing Tier
MDM
RDM
Audit and
Policy mgmt
Data mgmt
services
Systems monitoring and management
Relational
Database
MPP
Database
In-memory
processing
Data Management Tier
Workflow Management
Insights Tier
SQL
MapReduce
NoSQL
Spark
R
Action Tier
Real-time
Insights
Batch
Insights
Interactive
Insights
Hadoop App Server
Web
Services
HDFS
Software-
defined Storage
Enterprise
SAN/NAS
Public Cloud Hybrid Cloud Private Cloud
Infrastructure Tier
VMware vCloud Suite vCloud Hybrid Services

More Related Content

What's hot

Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an Open Source Hybrid Cloud Data ArchitectureRunning Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
DataWorks Summit
 

What's hot (20)

Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
 
The Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricThe Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data Centric
 
Emc vi pr controller customer presentation
Emc vi pr controller customer presentationEmc vi pr controller customer presentation
Emc vi pr controller customer presentation
 
DataCore At VMworld 2016
DataCore At VMworld 2016DataCore At VMworld 2016
DataCore At VMworld 2016
 
Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an Open Source Hybrid Cloud Data ArchitectureRunning Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
 
In-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessIn-Place analytics with Unified Data Access
In-Place analytics with Unified Data Access
 
Emc vi pr hdfs data service technical overview
Emc vi pr hdfs data service technical overviewEmc vi pr hdfs data service technical overview
Emc vi pr hdfs data service technical overview
 
Can $0.08 Change your View of Storage?
Can $0.08 Change your View of Storage?Can $0.08 Change your View of Storage?
Can $0.08 Change your View of Storage?
 
Trends in Data Protection with DCIG
Trends in Data Protection with DCIGTrends in Data Protection with DCIG
Trends in Data Protection with DCIG
 
Next Generation Data Protection Architecture
Next Generation Data Protection Architecture Next Generation Data Protection Architecture
Next Generation Data Protection Architecture
 
HP Helion OpenStack Community Edition Deployment
HP Helion OpenStack Community Edition DeploymentHP Helion OpenStack Community Edition Deployment
HP Helion OpenStack Community Edition Deployment
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in Cloud
 
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshopDeep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
 
DataCore Software - The one and only Storage Hypervisor
DataCore Software - The one and only Storage HypervisorDataCore Software - The one and only Storage Hypervisor
DataCore Software - The one and only Storage Hypervisor
 
2/18 Technical Overview
2/18 Technical Overview2/18 Technical Overview
2/18 Technical Overview
 
Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
 
HPC DAY 2017 | The network part in accelerating Machine-Learning and Big-Data
HPC DAY 2017 | The network part in accelerating Machine-Learning and Big-DataHPC DAY 2017 | The network part in accelerating Machine-Learning and Big-Data
HPC DAY 2017 | The network part in accelerating Machine-Learning and Big-Data
 
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPCHPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
HPC DAY 2017 | FlyElephant Solutions for Data Science and HPC
 

Viewers also liked

Blackdeverapresentacaooficial38slides1 130428220014-phpapp01
Blackdeverapresentacaooficial38slides1 130428220014-phpapp01Blackdeverapresentacaooficial38slides1 130428220014-phpapp01
Blackdeverapresentacaooficial38slides1 130428220014-phpapp01
Cristiano Santos
 
FINAL APPROVED Digital transformation of the health sector - summary record o...
FINAL APPROVED Digital transformation of the health sector - summary record o...FINAL APPROVED Digital transformation of the health sector - summary record o...
FINAL APPROVED Digital transformation of the health sector - summary record o...
SochaBlue
 

Viewers also liked (20)

Fundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvediFundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
 
Multilateral newsletter february 2015
Multilateral newsletter february 2015Multilateral newsletter february 2015
Multilateral newsletter february 2015
 
Mickey
MickeyMickey
Mickey
 
Blackdeverapresentacaooficial38slides1 130428220014-phpapp01
Blackdeverapresentacaooficial38slides1 130428220014-phpapp01Blackdeverapresentacaooficial38slides1 130428220014-phpapp01
Blackdeverapresentacaooficial38slides1 130428220014-phpapp01
 
Radna biografija tanja 2
Radna biografija  tanja 2Radna biografija  tanja 2
Radna biografija tanja 2
 
La motivación
La motivaciónLa motivación
La motivación
 
Economy Matters - Nov-Dec 2014
Economy Matters - Nov-Dec 2014Economy Matters - Nov-Dec 2014
Economy Matters - Nov-Dec 2014
 
Catching the wave - Taking advantage of carbon awareness after COP21
Catching the wave - Taking advantage of carbon awareness after COP21Catching the wave - Taking advantage of carbon awareness after COP21
Catching the wave - Taking advantage of carbon awareness after COP21
 
O&M Specimen model – alignments with PROV, BCO
O&M Specimen model – alignments with PROV, BCOO&M Specimen model – alignments with PROV, BCO
O&M Specimen model – alignments with PROV, BCO
 
AFCR and its contribution to a sustainable fuel cycle
AFCR and its contribution to a sustainable fuel cycleAFCR and its contribution to a sustainable fuel cycle
AFCR and its contribution to a sustainable fuel cycle
 
Global Watch
Global Watch Global Watch
Global Watch
 
Будущее, которое наступило. Каким быть образованию в XXI веке?
Будущее, которое наступило. Каким быть образованию в XXI веке?Будущее, которое наступило. Каким быть образованию в XXI веке?
Будущее, которое наступило. Каким быть образованию в XXI веке?
 
Nuxeo Roadmap June 2012
Nuxeo Roadmap June 2012Nuxeo Roadmap June 2012
Nuxeo Roadmap June 2012
 
Huisarts Academy Kies een Leermodule
Huisarts Academy Kies een LeermoduleHuisarts Academy Kies een Leermodule
Huisarts Academy Kies een Leermodule
 
A Glimpse into India's Future
A Glimpse into India's FutureA Glimpse into India's Future
A Glimpse into India's Future
 
FINAL APPROVED Digital transformation of the health sector - summary record o...
FINAL APPROVED Digital transformation of the health sector - summary record o...FINAL APPROVED Digital transformation of the health sector - summary record o...
FINAL APPROVED Digital transformation of the health sector - summary record o...
 
Asrun Kynning 2.Nov
Asrun Kynning 2.NovAsrun Kynning 2.Nov
Asrun Kynning 2.Nov
 
Partnership Summit Theme Document January 2015
Partnership Summit Theme Document January 2015Partnership Summit Theme Document January 2015
Partnership Summit Theme Document January 2015
 
Lezione Esempio Select
Lezione Esempio SelectLezione Esempio Select
Lezione Esempio Select
 
2011 Ford Edge Jacksonville, FL Catalog
2011 Ford Edge Jacksonville, FL Catalog2011 Ford Edge Jacksonville, FL Catalog
2011 Ford Edge Jacksonville, FL Catalog
 

Similar to EMC HADOOP Storage Strategy

Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
EMC
 

Similar to EMC HADOOP Storage Strategy (20)

SQL On Hadoop
SQL On HadoopSQL On Hadoop
SQL On Hadoop
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in Hadoop
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
EMC Unified Analytics Platform. Gintaras Pelenis
EMC Unified Analytics Platform. Gintaras PelenisEMC Unified Analytics Platform. Gintaras Pelenis
EMC Unified Analytics Platform. Gintaras Pelenis
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic Storage
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
EMC Big Data | Hadoop Starter Kit | EMC Forum 2014
EMC Big Data | Hadoop Starter Kit | EMC Forum 2014EMC Big Data | Hadoop Starter Kit | EMC Forum 2014
EMC Big Data | Hadoop Starter Kit | EMC Forum 2014
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
 
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...
Rain stor isilon_emc_real_Examine the Real Cost of Storing & Analyzing Your M...
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
EMC Atmos for service providers
EMC Atmos for service providersEMC Atmos for service providers
EMC Atmos for service providers
 
EMC EC Overview
EMC EC OverviewEMC EC Overview
EMC EC Overview
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 

Recently uploaded

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 

Recently uploaded (20)

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 

EMC HADOOP Storage Strategy

  • 1. 1© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. EMC Hadoop Storage Strategy Ed Walsh - @vEddieW Jim Ruddy - @Darth_Ruddy Dan Baskette - @dbbaskette
  • 2. 2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. CHANGES IN ANALYTICS DATA VOLUME DATA VELOCITY DATA TYPES APPS
  • 3. 3© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. DATA LAKE TECHNOLOGY HADOOP DISTRIBUTED FILE SYSTEM • Highly saleable & portable - Apache Open Source Specification • Structured and unstructured data • Analytics API interface standard • Storage hardware flexibility • Performance optimized for large file access HDFS TRADE-OFFS • Optimized for streaming writes; poor for random seeks • Write once file system • Hardware failure results in reduced performance • Specialized file system, not designed for general use HDFS Architecture Client NameNode Secondary NameNode (Now called checkpoint or backup node) Where do I read or write data? Just these nodes DataNode DataNode DataNode Data Status HDFS Data
  • 4. 4© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. DATA LAKE TECHNOLOGY HADOOP TIER DataNode HDFS DataNode HDFS DataNode HDFS DataNode HDFS DataNode HDFS DataNode HDFS DataNode HDFS PROCESSING TIER – ME, HIVE, ETC. DEEP SCALE SQL ANALYTICS – PIVOTAL HAWQ IN MEMORY TIER SQL OBJECTS JSON DATABASES Operational data is the focus (it is in memory, mostly) Continue to work with RDBs All data, history in HDFS HDFS data files directly accessible inside Hadoop Analytic results routed to memory tier
  • 5. 5© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. DATA LAKE STORAGE FEATURES NO SILOS Multi-protocol access Simultaneous access for unstructured data Separation of storage from access protocol OPTMIZED COST Choice of storage hardware Multi-vendor, no lock-in LIMITLESS SCALE Expand capacity as needed Massive scale-out Highly available
  • 6. 6© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. INTEGRATED HDFS WITH HADOOP DISTRO STRENGTHS • Tightly coupled with Hadoop software • Low cost • Storage hardware choice • Integrated software support • Data locality
  • 7. 7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. HDFS STORAGE ARRAY INTERFACE STRENGTHS • No Ingest necessary • NameNode Fault Tolerance • Eliminate 3x mirroring • Multi-protocol access • Simultaneous Multi-Hadoop distribution support • Smart-Dedupe for Hadoop • SEC 17a-4 Compliance • Kerberos Authentication • Application Multi-tenancy EMC Hadoop Starter Kit: https://community.emc.com/docs/DOC-26892
  • 8. 8© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. HDFS BY STORAGE VIRTUALIZATION SOFTWARE STRENGTHS • Multi-protocol access - Object, HDFS, Block (iSCSI), more coming - Write file, read object & vice versa • NameNode Fault Tolerance • Eliminate 3x mirroring • Compute & data locality • Application multi-tenancy • Heterogeneous Storage: - Pool server storage - Enterprise arrays • EMC, Netapp, Hitachi EMC Hadoop Starter Kit: https://community.emc.com/docs/DOC-34442
  • 9. 9© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. ANALYTICS APPLIANCES STRENGTHS • Rapid deployment • Predictable performance & scale • Optimum resource utilization • Integrated, simplified management • Simplified support & maintenance • Optimized cost • Highest Reliability, Availability, and Stability
  • 10. 10© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Traditional Analytics Architecture RMT Historian IMAS Alarm LIMS Oracle BI (SSRS, Panopticon, Web) Analytics Server (SAS) Analytics Server (R) Pre- aggregated Tables BI (Cognos)
  • 11. 11© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Modern Analytics Architecture EMC Data Lake Architecture RMT Historian IMAS Alarm LIMS BI Server (SSRS, Panopticon, Web) Analytics Server (SAS) Analytics Server (R) Historian Alpine/Chor us (Pivotal) “Real Time” Feed BI Server (Tableau or other) Reporting DB GemFire XD HAWQ HDFS
  • 12. 12© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. MODERN ANALYTICS USING DATA LAKE DEMO
  • 13. 13© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. EMC DATA LAKE CAPABILITIEs Documents (XLS, PPT, DOC) SQL Databases Rich Media (PDF, JPG, Video, Streaming) Sensor Data (GPS coordinates, temperature measurements) Unstructured Context (Web Server Logs, Scale Effortlessly | Store Efficiently | Access Globally
  • 14. Ed Walsh - @vEddieW Jim Ruddy - @Darth_Ruddy Dan Baskette - @dbbaskette
  • 15. 16© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. The Emerging Data Platform Ecosystem Business Data Lake Ingestion Tier Real-time Batch Micro batch Data Sources Clickstream Sensors Telemetrics Weblogs Network Data CRM ERP Data Collab} Insights Tier SQL MapReduce NoSQL Spark R Action Tier Real-time Insights Batch Insights Interactive Insights Operations Tier Data Services Tier Processing Tier MDM RDM Audit and Policy mgmt Data mgmt services Systems monitoring and management Relational Database MPP Database In-memory processing Workflow Management Hadoop App Server Web Services Data Management Tier HDFS Software- defined Storage Enterprise SAN/NAS Public Cloud Hybrid Cloud Private Cloud Infrastructure Tier
  • 16. 17© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Business Data Lake EMC Federation Solutions Data Sources Ingestion Tier Clickstream Sensors Telemetrics Real-time Weblogs Network Data Batch CRM ERP Data Collab Micro batch } Operations Tier Data Services Tier Processing Tier MDM RDM Audit and Policy mgmt Data mgmt services Systems monitoring and management Relational Database MPP Database In-memory processing Data Management Tier Workflow Management Insights Tier SQL MapReduce NoSQL Spark R Action Tier Real-time Insights Batch Insights Interactive Insights Hadoop App Server Web Services HDFS Software- defined Storage Enterprise SAN/NAS Public Cloud Hybrid Cloud Private Cloud Infrastructure Tier VMware vCloud Suite vCloud Hybrid Services

Editor's Notes

  1. Situation: Past few years we have seen a major transformation on next gen analytics.Big Data is a major focus of your business and application teams2013 EMCWorldwe announced Isilon HDFS support, and launch of Pivotal2014 EMCWorldViPR HDFS, Pivotal momentum, industry investment in Cloudera, and MongoDBToday Amazon 1/3 of sales comes from personalization & recc systemsNot just big companies like Amazon, and eBay but even your local groceryWeekly sales circularNow-enter store you get target marketing messages with discounts 4uStores collecting data on every shopper using loyalty and WIFIEvery industry healthcare, insurance, financialProblem: confusion, what do I need to do. Almost 40% EBCStake: In order to determine the best IT infrastructure for these services you need to understand the key enablers<next>
  2. First evolution is data volume: EMC Digital Universe 7th UpdateDigital Data stored will double every two years for next decadeData growth from emerging markets are exploding60% data generated from mature markets US, Japan, GermanyBy 2016 60% of data generated from emerging economies such Brazil, China, and MexicoThe second evolution is the impact from the Internet of Things or the Industrial Internet, Data collection is accelerating:14 Billion internet connected devices today, 2% of all data32 billion internet connected devices in 2020, 10% of all dataGE Wind Turbine example, 20,000 sensors. 400 updates per second.The third is analysis of unstructured data including images, video, and audio. No longer just analyzing neat tables of data organ in columns & rows.The NYC exploding manhole coverMessy data – records from 188051,000 manholesEnough cable to wrap the earth 2.5 times44% reduction in disastersThe forth, is new tools optimized to analyze large, complete data sets, that are often dense with frequently collected data from sensors and devices, and include unstructured data such as images, video, and audio. These tools are inexpensive, leverage Open Source. Easy to deploy – local grocerCombination of collect, store more data cost effectively with new tools creating perfect storm for Big Data. <click>My current storage architecture can’t meet all these requirements. What should a storage architect do?
  3. First step is you need to content repository or Data Lake. Most of the new analytics tools such as Hadoop rely upon HDFS and it API interface. Several great attributes of HDFS:Scales from terabytes to Petabytes easilyOptimized for Big Block IO – 64MB block sizeSupports structured & unstructured dataOpen source low cost, HW independentLet’s look at this simple HDFS block diagramHighly distributed processingBut, what about those Wind Turbines streaming 400 data points a second?Customers are combining in-memory database technologies such as GemfireXD and Impala <click>
  4. IMDG provide the fact ingest and query performance. IMDG technologies such as GemfireXD will write copies of their data to HDFS for persistence and deeper analytics.IMDG+HDFS support store, and analysis capability for large data sets, streaming ingest, and analysis of structured, and unstructured data. Tools like Pivotal HAWQ allow you to access data in IMDG and HDFS Data LakeWhat are storage requirements for DL?
  5. Cost Optimized: We recommend HDFS and IMDG to manage storage costs at scale with hot edge, cold core arch. Minimize $/GB. Data will double every two years.No Silo’s: Content Repository/DL must be accessible by all protocols. Write one protocol, read any. Ready for next big thingScalability from terabytes to 100’s of petabytes. Non disruptive capacity growth. No down time migrationsPiece of cake? How many storage solutions can do this today?No one storage platform provides all this. EMC believes in building blocks and options. There are four common DL storage options today. Each have +/-’s
  6. First one Hadoop HDFS on server storageMost start hereExperience issues with scale. Poor capacity utilizationDisadvantages:Low efficiencyHardware support at scaleLimited to Hadoop distroHadoop silo
  7. +Access data already storedLeverage existing investmentEnterprise Reliability, Security, and Availability ** EMC Hadoop Starter Kit<<talk about EMC Elastic Cloud Storage - ECSCommon concerns:- limited high performance options- storage hardware lockin- HDFS compatibility with Hadoop Distro’s
  8. ViPR architectureHadoop Starter Kit – ViPR editionLot’s to like:Leverage existing investmentCentralized management/provisioningLeverage reliability, security, and availability of storage HWFlexibility of Data ServicesCommon concerns: - new with HDFS data services GA in Feb 2014 - HDFS compatibility with Hadoop Distro’s (HCFS)
  9. Mature: Greenplum DCA, and VCE Vblock for Big DataLarge enterprise and SP customersFast to deploy, predictable performanceCommon concerns:Hardware Vendor lockinInflexible modulesSlower innovationThese four options all have strengths and weaknesses. The most mature for Gown up HDFS is our Isilon solution with many happy customers. The most compelling is our storage software virtualization solution, ViPR but it is new and building traction. With the 2.0 release it is gaining many of the features customers need now. Things like additional protocol support is road mapped over the next 12 months.Do you want to see this in action? I’d like to introduce Jim Ruddy - Lead EMC OCTO Big Data Architect to demo a Data Lake in action with the Pivotal Analytics suite from a recent customer deployment. Jim what are you going to show us?
  10. Demo – Retail Use Case1)      Data enters though adapters. These adapters can receive data from multiple sources like Twitter, POS, manufacturing devices, or from sources on “The internet of things”2)      The adapter is written in SpringXDwith can be a single node or scale to multiple nodes.3)      The first analytics of data is done at this level. Where does the data go? Does it need instance analysis or does it need to be compared to a history of transactions 4)      There are 2 ways data can be written at this point. It can be directly written from the adapter to GemfireXD for in memory analytics or a tap can be done, where data is written to GemfireXD and HDFS at the same time. The adapter can also decide if some data goes to GemfireXD and some goes to HDFS and how to make this determination. This would be the first level of analytics.5)      Once data is in Gemfire, it is stored using in-memory tables, or you can persist very large tables to local disk store files or to HDFS. How long or where the data is kept is variable and can be tuned per table created. The use of pivotal framework extension (PFX) allow for HAWQ to query data in memory.  As the data is persisted to HDFS Hawq can also query the data there as well.6)      GemfireXD is built as a cluster. There is one locater server and one or more data servers that host data. These servers keep the tables in memory, have local storage to persist data, can write and read data to/from HDFS and run yarn/map reduce jobs.7)      Once data is persisted to HDFS Yarn (mapreduce version2) can then run batch jobs against the data. 8)      Every node in Pivotal hadoop that is a Yarn node manager is also a Hawq segment. This is how hawq access’s data in HDFS.9)      Once data is persisted to HDFS, Hawq and hadoop can do historical analysis.
  11. Awesome Jim. As you can see Gen Analytics is very powerful for your business. It is the top priority for many of our customers application teams.EMC is uniquely qualified as the industry leader in data storage, 30+ years of history of innovation helping our customers and industry through these evolutions. We also have learned a great deal from our experience with Pivotal.We believe the key is to architect your content repository using a combination of storage technologies optimized for both $/GB, and performance, to support the new analytics tools. These tools require access via a variety of protocols including legacy file, SQL, and new storage protocols such as Object and HDFS.In closing, EMC provides highly scalable, and cost efficient storagesolutions that are part of our building block approach. We have proven solutions to help you deploy a DL that scales effortlessly, and cost effectively, across geo’s. Thank you.We have time for some questions…Jim, Dan please join me.