SlideShare a Scribd company logo
1 of 17
1©MapR Technologies. All rights reserved.
How One Company Offloaded Data
Warehouse ETL To Hadoop and
Saved $30 Million
Rob Rosen
Sr. Director, Americas Systems Engineering
MapR Technologies
2©MapR Technologies. All rights reserved.
MapR Overview
 Enterprise-grade platform for Hadoop
 Deployed at thousands of companies
– Including 12 of the Fortune 100
 MapR is the preferred analytics platform
– Hundreds of billions of events daily
– 90% of the world’s Internet population monthly
– $1 trillion in retail purchases annually
3©MapR Technologies. All rights reserved.
Arrival of Big Data Impacts Data Warehouse
Data
Warehouse
Volume
Variety
Velocity
Prohibitively expensive
storage costs
Inability to process
unstructured formats
Faster arrival and
processing needs
4©MapR Technologies. All rights reserved.
Top Concern for Big Data
Multiple data sources
Multiple technologies
Multiple copies of data
“Too many different types, sources, and formats of critical data”
5©MapR Technologies. All rights reserved.
The Hadoop Advantage
 Fueling an industry revolution by
providing infinite capability to
store and process Big Data
 Expanding analytics across
data types
 Compelling economics
– 20 to 100X more cost effective than
alternatives
Pioneered at
6©MapR Technologies. All rights reserved.
Important Drivers for Hadoop
 Data on compute drives efficiencies
and better analytics
 With Hadoop you don’t need to know
what questions to ask beforehand
 Simple algorithms on Big Data
outperform complex models
 Powerful ability to analyze
unstructured data
7©MapR Technologies. All rights reserved.
Hadoop is the Technology of Choice
for Big Data
8©MapR Technologies. All rights reserved.
Source Data
Social Media, Web Logs
Machine Device,
Scientific
Documents and Emails
Batch ETL
Transactions,
OLTP, OLAP
Enterprise Data
Warehouse
Raw data or infrequently used data
consuming capacity
Batch windows hitting their limits
putting SLAs at risk
Databases and data warehouses are
exceeding their capacity too quickly
How Do You Lower and
Control Data Warehouse Costs?
Datamarts
ODS
Traditional Targets
9©MapR Technologies. All rights reserved.
Source Data Traditional Targets
Social Media, Web Logs
Machine Device,
Scientific
Documents and Emails
Transactions,
OLTP, OLAP
Enterprise Data
Warehouse
Lower Data Management Costs
RDBMS
MDM
10©MapR Technologies. All rights reserved.
Bottom-Line Impact
Sensor Data
Web Logs
Hadoop
RDBMS
Benefits:
 Both structured and unstructured data
 Expanded analytics with MapReduce, NoSQL, etc.
DW
Query +
PresentETL + Long Term StorageETL + Long Term Storage
Solution Cost / Terabyte Hadoop Advantage
Hadoop $333
Teradata Warehouse Appliance $16,500 50x savings
Oracle Exadata $14,000 42x savings
IBM Netezza $10,000 30x savings
11©MapR Technologies. All rights reserved.
What is the Best Way to Deploy Hadoop?
vs.
• Highly available and fully
protected data
• Works with existing tools
• Real-time ingestion and
extraction
• Archive data from data
warehouse
Transitory Data Store
• No long-term scale
advantages
• Unprotected data
• ETL Tool focus
Permanent Data Store
Enterprise Data Hub
12©MapR Technologies. All rights reserved.
An Enterprise Data Hub
 Combine different data sources
 Minimize data movement
 One platform for analytics
Sales
SCM
CRM
Public
Web Logs
Production
Data
Sensor
DataClick
Streams
Location
Social
Media
Billing
Enterprise
Data Hub
13©MapR Technologies. All rights reserved.
Key Elements of Enterprise Data Hub
99.999% HA Data
Protection
Disaster
Recovery
Scalability
&
Performance
Enterprise
Integration
Multi-
tenancy
Enterprise-grade platform
for the long term
• Reliability to support
stringent SLAs
• Protection from data loss and
user or application errors
• Support business continuity
and meet recovery objectives
14©MapR Technologies. All rights reserved.
High Availability and Dependability
Reliable
Compute
Dependable
Storage
 Automated stateful failover
 Automated re-replication
 Self-healing from HW and SW
failures
 Load balancing
 Rolling upgrades
 No lost jobs or data
 99999s of uptime
• Business continuity with
snapshots and mirrors
• Recover to a point in time
• End-to-end check summing
• Strong consistency
• Data safe
• Mirror across sites to meet
Recovery Time Objectives
15©MapR Technologies. All rights reserved.
Enterprise Data Hub Supports
a Range of Applications
99.999%
HA
Data
Protection
Disaster
Recovery
Scalability
&
Performance
Enterprise
Integration
Multi-
tenancy
Batch Interactive Real-time
Self-healing
Instant
recovery
Snapshots for
point in time
recovery from
user or
application
errors
Unlimited files
& tables
Record setting
performance
Direct data
ingestion and
access
Fully compliant
ODBC access and
SQL-92 support
Mirroring
across clusters
and the WAN
Secure access to
multiple users
and groups
16©MapR Technologies. All rights reserved.
Business Impact
 Saved millions in TCO
 10x faster, 100x cheaper
 Maintain the same SLAs
 Implemented the change without impacting users
Summary
17©MapR Technologies. All rights reserved.
Q & A
Engage with us!
@mapr
mapr-
technologies
maprtech
MapR
maprtech
rrosen@maprtech.com

More Related Content

What's hot

What's hot (20)

Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and vision
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache Arrow
 
Lakehouse Analytics with Dremio
Lakehouse Analytics with DremioLakehouse Analytics with Dremio
Lakehouse Analytics with Dremio
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Spark - Alexis Seigneurin (Français)
Spark - Alexis Seigneurin (Français)Spark - Alexis Seigneurin (Français)
Spark - Alexis Seigneurin (Français)
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreOracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdf
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFS
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse Platforms
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Gfs vs hdfs
Gfs vs hdfsGfs vs hdfs
Gfs vs hdfs
 
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeHadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 

Viewers also liked

Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 
OSS BSS BEST BOOK
OSS BSS BEST BOOKOSS BSS BEST BOOK
Rolex Science: The Fake Signs (3)
Rolex Science: The Fake Signs (3)Rolex Science: The Fake Signs (3)
Rolex Science: The Fake Signs (3)
Dindin Watoto
 
Google blogger 的架設與操作教學
Google blogger 的架設與操作教學Google blogger 的架設與操作教學
Google blogger 的架設與操作教學
Mike Lee
 
Entrepreneurial Operating System (EOS): Model and Process
Entrepreneurial Operating System (EOS): Model and ProcessEntrepreneurial Operating System (EOS): Model and Process
Entrepreneurial Operating System (EOS): Model and Process
Traction Masters
 
Technical architect kpi
Technical architect kpiTechnical architect kpi
Technical architect kpi
tomjonhss
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performance
DataWorks Summit
 

Viewers also liked (20)

Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
 
OSS BSS BEST BOOK
OSS BSS BEST BOOKOSS BSS BEST BOOK
OSS BSS BEST BOOK
 
IPSAS Implementation
IPSAS ImplementationIPSAS Implementation
IPSAS Implementation
 
Rolex Science: The Fake Signs (3)
Rolex Science: The Fake Signs (3)Rolex Science: The Fake Signs (3)
Rolex Science: The Fake Signs (3)
 
Google blogger 的架設與操作教學
Google blogger 的架設與操作教學Google blogger 的架設與操作教學
Google blogger 的架設與操作教學
 
Entrepreneurial Operating System (EOS): Model and Process
Entrepreneurial Operating System (EOS): Model and ProcessEntrepreneurial Operating System (EOS): Model and Process
Entrepreneurial Operating System (EOS): Model and Process
 
Best Practices for Software Product Development
Best Practices for Software Product DevelopmentBest Practices for Software Product Development
Best Practices for Software Product Development
 
Marketing Automation with Direct Mail
Marketing Automation with Direct MailMarketing Automation with Direct Mail
Marketing Automation with Direct Mail
 
Technical architect kpi
Technical architect kpiTechnical architect kpi
Technical architect kpi
 
ETL tool evaluation criteria
ETL tool evaluation criteriaETL tool evaluation criteria
ETL tool evaluation criteria
 
Katangian ng wika
Katangian ng wikaKatangian ng wika
Katangian ng wika
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performance
 
Grolsch growing globally beer case study
Grolsch growing globally beer case studyGrolsch growing globally beer case study
Grolsch growing globally beer case study
 
Master Data Management methodology
Master Data Management methodologyMaster Data Management methodology
Master Data Management methodology
 

Similar to How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million

Similar to How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million (20)

Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution Roadshow
 
Expect More from Hadoop
Expect More from Hadoop Expect More from Hadoop
Expect More from Hadoop
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data Warehouse
 
Integrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environmentIntegrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environment
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
AWS re:Invent 2016: Fireside chat with Groupon, Intuit, and LifeLock on solvi...
AWS re:Invent 2016: Fireside chat with Groupon, Intuit, and LifeLock on solvi...AWS re:Invent 2016: Fireside chat with Groupon, Intuit, and LifeLock on solvi...
AWS re:Invent 2016: Fireside chat with Groupon, Intuit, and LifeLock on solvi...
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 

How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million

  • 1. 1©MapR Technologies. All rights reserved. How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million Rob Rosen Sr. Director, Americas Systems Engineering MapR Technologies
  • 2. 2©MapR Technologies. All rights reserved. MapR Overview  Enterprise-grade platform for Hadoop  Deployed at thousands of companies – Including 12 of the Fortune 100  MapR is the preferred analytics platform – Hundreds of billions of events daily – 90% of the world’s Internet population monthly – $1 trillion in retail purchases annually
  • 3. 3©MapR Technologies. All rights reserved. Arrival of Big Data Impacts Data Warehouse Data Warehouse Volume Variety Velocity Prohibitively expensive storage costs Inability to process unstructured formats Faster arrival and processing needs
  • 4. 4©MapR Technologies. All rights reserved. Top Concern for Big Data Multiple data sources Multiple technologies Multiple copies of data “Too many different types, sources, and formats of critical data”
  • 5. 5©MapR Technologies. All rights reserved. The Hadoop Advantage  Fueling an industry revolution by providing infinite capability to store and process Big Data  Expanding analytics across data types  Compelling economics – 20 to 100X more cost effective than alternatives Pioneered at
  • 6. 6©MapR Technologies. All rights reserved. Important Drivers for Hadoop  Data on compute drives efficiencies and better analytics  With Hadoop you don’t need to know what questions to ask beforehand  Simple algorithms on Big Data outperform complex models  Powerful ability to analyze unstructured data
  • 7. 7©MapR Technologies. All rights reserved. Hadoop is the Technology of Choice for Big Data
  • 8. 8©MapR Technologies. All rights reserved. Source Data Social Media, Web Logs Machine Device, Scientific Documents and Emails Batch ETL Transactions, OLTP, OLAP Enterprise Data Warehouse Raw data or infrequently used data consuming capacity Batch windows hitting their limits putting SLAs at risk Databases and data warehouses are exceeding their capacity too quickly How Do You Lower and Control Data Warehouse Costs? Datamarts ODS Traditional Targets
  • 9. 9©MapR Technologies. All rights reserved. Source Data Traditional Targets Social Media, Web Logs Machine Device, Scientific Documents and Emails Transactions, OLTP, OLAP Enterprise Data Warehouse Lower Data Management Costs RDBMS MDM
  • 10. 10©MapR Technologies. All rights reserved. Bottom-Line Impact Sensor Data Web Logs Hadoop RDBMS Benefits:  Both structured and unstructured data  Expanded analytics with MapReduce, NoSQL, etc. DW Query + PresentETL + Long Term StorageETL + Long Term Storage Solution Cost / Terabyte Hadoop Advantage Hadoop $333 Teradata Warehouse Appliance $16,500 50x savings Oracle Exadata $14,000 42x savings IBM Netezza $10,000 30x savings
  • 11. 11©MapR Technologies. All rights reserved. What is the Best Way to Deploy Hadoop? vs. • Highly available and fully protected data • Works with existing tools • Real-time ingestion and extraction • Archive data from data warehouse Transitory Data Store • No long-term scale advantages • Unprotected data • ETL Tool focus Permanent Data Store Enterprise Data Hub
  • 12. 12©MapR Technologies. All rights reserved. An Enterprise Data Hub  Combine different data sources  Minimize data movement  One platform for analytics Sales SCM CRM Public Web Logs Production Data Sensor DataClick Streams Location Social Media Billing Enterprise Data Hub
  • 13. 13©MapR Technologies. All rights reserved. Key Elements of Enterprise Data Hub 99.999% HA Data Protection Disaster Recovery Scalability & Performance Enterprise Integration Multi- tenancy Enterprise-grade platform for the long term • Reliability to support stringent SLAs • Protection from data loss and user or application errors • Support business continuity and meet recovery objectives
  • 14. 14©MapR Technologies. All rights reserved. High Availability and Dependability Reliable Compute Dependable Storage  Automated stateful failover  Automated re-replication  Self-healing from HW and SW failures  Load balancing  Rolling upgrades  No lost jobs or data  99999s of uptime • Business continuity with snapshots and mirrors • Recover to a point in time • End-to-end check summing • Strong consistency • Data safe • Mirror across sites to meet Recovery Time Objectives
  • 15. 15©MapR Technologies. All rights reserved. Enterprise Data Hub Supports a Range of Applications 99.999% HA Data Protection Disaster Recovery Scalability & Performance Enterprise Integration Multi- tenancy Batch Interactive Real-time Self-healing Instant recovery Snapshots for point in time recovery from user or application errors Unlimited files & tables Record setting performance Direct data ingestion and access Fully compliant ODBC access and SQL-92 support Mirroring across clusters and the WAN Secure access to multiple users and groups
  • 16. 16©MapR Technologies. All rights reserved. Business Impact  Saved millions in TCO  10x faster, 100x cheaper  Maintain the same SLAs  Implemented the change without impacting users Summary
  • 17. 17©MapR Technologies. All rights reserved. Q & A Engage with us! @mapr mapr- technologies maprtech MapR maprtech rrosen@maprtech.com

Editor's Notes

  1. MapR combines the best of the open source technology with our own deep innovations to provide the most advanced distribution for Apache Hadoop.MapR’s team has a deep bench of enterprise software experience with proven success across storage, networking, virtualization, analytics, and open source technologies.Our CEO has driven multiple companies to successful outcomes in the analytic, storage, and virtualization spaces.Our CTO and co-founder M.C. Srivas was most recently at Google in BigTable. He understands the challenges of MapReduce at huge scale. Srivas was also the chief software architect at Spinnaker Networks which came out of stealth with the fastest NAS storage on the market and was acquired quickly by NetAppThe team includes experience with enterprise storage at Cisco, VmWare, IBM and EMC. Our VP of Engineering was the senior vice president at Informatica where he built and managed a large R&D team of 250 that spanned four geographies with annual revenues of $300M. We also have experience in Business Intelligence and Analytic companies and open source committers in Hadoop, Zookeeper and Mahout including PMC members.MapR is proven technology with installs by leading Hadoop installations across industries and OEM by EMC and Cisco.
  2. Need a Platform that serves the broadest sets of use cases….
  3. Map Reduce is a paradigm shift. It’s moving the processing to the data.Apache Hadoop is a software framework that supports data-intensive distributed applications. Hadoop was inspired by a published Google MapReduce whitepaper. Apache Hadoop provides a new platform to analyze and process Big Data. With data growth exploding and new unstructured sources of data expanding a new approach is required to handle the volume, variety and velocity of this growing data. Hadoop clustering exploits commodity servers and increasingly less expensive compute, network and storage.Google is the Poster Child for the power of MapReduce. They were the 19th search engine to enter the market. There were 18 companies more successful and within 2 years, Google was the dominant player. That’s the power of the MapReduce framework.---------------------------Long versionA poster child for this is Google. We now take Google’s dominance for granted, but when Google launched their beta in 1998 they were late. They were at least the 19 search engines on the market. Yahoo was dominant, there was infoseek, excite, Lycos, Ask Jeeves, AltaVista (which had the technical cred). It wasn’t until Google published a paper in 2003 that we got a glimpse at their back end architecture. Google was able to reach dominance because they recognized early on the paradigm shift and they were able to index more data, get better results and do it much much more efficiently and cost effectively than their competitors. They went from 19th to first in a few short years because of MapReduce.A Yahoo engineer by the name of Doug Cutting read that same paper in 2003 and developed a Java implementation of MapReduce named after his son’s stuffed elephant that became the basis for the open source Hadoop project. Now when we say Hadoop we’re talking about a robust ecosystem. There are now multiple commercial versions of Hadoop. There’s a complete stack that includes job management, development tools, schedulers, machine learning libraries, etc. MapR’s co-founder and CTO was at Google he was in charge of the BigTable group and understands MapReduce at scale. Our charter was to fix the underlying flaws of the hadoop implementation to make it appropriate more a broader set of applications and work for most organizations.
  4. Let’s start with this chart. To reinforce you’re in the right room you picked the right session…Hadoop Not only is it the fastest growing Big Data technology…It is one of the fastest technologies period….Hadoop adoption is happening across industries and across a wide range of application areas.What’s driving this adoption
  5. Databases and data warehouses are growing & exceeding capacity too quicklyInactive data consuming storage and degrading performanceLow density & low priority data disproportionately consuming storage & processing capacityBatch windows hitting their limits putting SLAs at riskExtracts put too much load on source systems adding to expenseNot all data required is in the data warehouse
  6. With MapR Hadoop is Lights out Data Center ReadyMapR provides 5 99999’s of availability including support for rolling upgrades, self –healing and automated stateful failover. MapR is the only distribution that provides these capabilities, MapR also provides dependable data storage with full data protection and business continuity features. MapR provides point in time recovery to protect against application and user errors. There is end to end check summing so data corruption is automatically detected and corrected with MapR’s self healing capabilities. Mirroring across sites is fully supported.All these features support lights out data center operations. Every two weeks an administrator can take a MapR report and a shopping cart full of drives and replace failed drives.