Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
1. Modernizing to a
cloud data
architecture
Guido Oswald, Solutions Architect, Databricks
Matt Graves, VP of Enterprise Data & Analytics,
GCI Communication Corp
2. Agenda
• Top reasons to modernize from Hadoop to Databricks
• Success stories, technical and business benefits
• Fast migrations with low costs & low risk
• Fireside Chat: Matt Graves
5. Digital transformation is accelerating
E-Commerce
Wearables, medical IoT
Streaming
Mobile payments, food
service, grocery deliveries…
The data surge is placing
tremendous pressure on
traditional data and analytics
infrastructure
Source: Gartner cited by Battery Ventures - Open Cloud report
Cloud adoption is
accelerating by $100B
from 2021 - 2023
6. Today, most enterprises struggle with data
Siloed stacks increase data architecture complexity
Data Warehousing Data Engineering Streaming
Data Science & Machine
Learning
Extract
Transform
Streaming data sources
Streaming Data Engine
Analytics and BI
Data marts
Data warehouse
Structured data
Structured, semi-structured
and unstructured data
Structured, semi-structured
and unstructured data
Data Lake
Data prep
Data Lake
Machine
Learning
Data
Science
Amazon Redshift Teradata
Azure Synapse Google BigQuery
Snowflake IBM Db2
SAP Oracle Autonomous
Data Warehouse
Hadoop Apache Airflow
Amazon EMR Apache Spark
Google Dataproc Cloudera
Jupyter Amazon SageMaker
Azure ML Studio MatLAB
Domino Data Labs SAS
TensorFlow PyTorch
Apache Kafka Apache Spark
Apache Flink Amazon Kinesis
Azure Stream Analytics
Tibco Spotfire
Google Dataflow
Confluent
Disconnected systems and proprietary data formats make integration difficult
Data
Scientists
Data
Engineers
Data
Analysts
Data
Engineers
Siloed data teams decrease productivity
Load Real-time Database
7. Is your architecture enabling growth?
Legacy on-premise data and analytics architectures are not keeping up
Hadoop costs rising when
costs need to be cut
Innovation hinges on ML
and predictive insights
Business agility requires
real-time data
8. Hadoop is costly, complex and ineffective
Hadoop ecosystem is complex,
hard to manage, and prone to
failures
24/7 HDFS clusters that need
to built for peak usage and are
costly to upgrade
• RIGID AND INELASTIC
• DEVOPS INTENSIVE
No out-of-box support for
ML/AI and separate data and AI
environments
• LACKS AI CAPABILITIES
Low Productivity Cost Prohibitive Slow Innovation
X
9. Enterprises need a modern data and analytics
architecture
CRITICAL REQUIREMENTS
Cost-effective scale and performance in the cloud
Easy to manage and highly reliable for diverse data
Predictive and real-time insights to drive
innovation
10. Modernization delivers business value
Forrester TEI study finds 417% ROI for
companies switching to Databricks
47%
Cost-savings from retiring
legacy infrastructure
5%
Increase in revenue
25%
Data team productivity
increase
Source: Forrester TEI: The total economic impact of the Databricks Unified Analytics Platform
11. The Databricks Lakehouse Platform is one simple platform to unify all
your data, analytics, and AI workloads
Original creators of popular data and machine learning open-source projects
Global company with over 5,000 customers and more than 450 partners
13. Structured Semi-structured Unstructured Streaming
Lakehouse Platform
Data Engineering
BI & SQL
Analytics
Real-time Data
Applications
Data Science
& Machine
Learning
Data Management & Governance
Open Data Lake
SIMPLE OPEN COLLABORATIVE
From BI to AI
All your data,
analytics and
AI on one
Lakehouse
platform
14. Data Eng, ML
(Spark)
Scalable apps on Columnar store
(Hbase)
ETL, SQL
(Hive/ Impala)
Databricks jobs / Delta Lake / SparkSQL
(Highly tuned Spark engine: faster, less compute, one-stop-shop)
Batch Process
(MapReduce)
Real-time Event Processing
(Storm/ Spark)
Databricks Spark jobs
(orders of magnitude faster - but may need manual work)
Databricks Structured Streaming
(Spark Structured Streaming + Delta Lake: Streaming + Batch ingest)
Databricks jobs/ Delta Lake
(Highly tuned Spark engine: faster, less compute, one-stop-
shop)
Databricks Spark integrates w/ HBase on cloud
(Alternatively: use cloud data stores well integrated with Databricks)
Technology mapping: deliver better outcomes
16. 55-66 % reduction in costs and 2-3x reduction in
timelines by using automation tools
Data Migration
Assessment & Design
Manual
Migration
Workloads Migration, Validation Cutover Operations
17- 20 Weeks
8 Weeks
Using
Automation
Accelerated Data & Workloads Migration,
Validation
Accelerated
Assessment &
Design
Cutover
Operations
* Typical implementation scenario ~ 4 PB of Data and 3000 jobs with mixed workloads considered
Same tool
used for pre-
migration
Assessment
17. Our partner ecosystem accelerates migrations
ISV Partners and Migration Tools
Security
Governance
Consulting & SI Partners
Databricks
Migration
SWAT team +
CS Packaged
Services
For Migration
Cloud
18. Modernization with Databricks - recap
Why - costs, productivity, innovation → business
impact
Your competitors and market leaders are doing it
NOW
Databricks experts and automation strategy can
help you migrate faster, with much lower cost and
risk