SlideShare una empresa de Scribd logo
1 de 25
©2021 Databricks Inc. — All rights reserved
Modernize your Data
Warehouse
Amit Kara, Director, Technical Product Marketing
Soham Bhatt, SME Lead, DW Migration
A migration journey to the Databricks Lakehouse
Platform
©2021 Databricks Inc. — All rights reserved
Agenda
• Why lakehouse for data warehousing
• How does Databricks help with Data Warehousing
• Key differentiators when using the Databricks Lakehouse Platform
• Demo: Data warehousing on Databricks
• How to modernize your data warehouse to a Lakehouse
• Key takeaways for migrating to the Lakehouse
©2021 Databricks Inc. — All rights reserved
What’s the problem
we’re solving?
©2021 Databricks Inc. — All rights reserved
Legacy Data Warehouses aren’t keeping up
Data Warehouses can’t
keep up with data
volume and variety
Innovation hinges on
integrating ML/AI and
predictive insights
Business agility requires
reliable, real-time data
Not cost effective,
especially with scale
Data is vendor locked-in
and duplicated
©2021 Databricks Inc. — All rights reserved
The problem with legacy CDW: a fragmented
approach to modernizing your architecture
Structured
Cloud
Data
Warehouse
Unstructured
Semi-Structured
DATA LAKE
BI Reports, Dashboards & SQL ELT/ETL
ADLS AWS S3 GCP
Data Science Model Training
Model Scoring Model Deployment
Limited support
for streaming
Limited support for
unstructured data
(audio/images/video)
Complex & many
stages.
Data is duplicated
Lock-in / proprietary
format
Compute cost for
all data access
Disparate tooling decreases data team
productivity
©2021 Databricks Inc. — All rights reserved
Why Data Warehousing on
Databricks?
©2021 Databricks Inc. — All rights reserved
Your tools of choice
Use your favorite tools like Fivetran, dbt, PowerBI , Tableau or
Databricks to ingest, transform and query all your data in-place.
Serverless compute
Lower costs and eliminate the need to manage, configure or scale
cloud infrastructure with serverless and get the best
price/performance.
Unified governance
Simplify architecture, establish one single copy for all your data, and
one unified governance layer across all data teams using standard SQL.
Why Data Warehousing
on Databricks
Unity Catalog
Delta Lake
All structured and unstructured data
Cloud Data Lake
Data
Warehousing
Data
Engineering
Data Science
and ML
Data
Streaming
Break down silos
Empower data scientists and analysts to access the most complete
and freshest data faster, and uncover new insights together.
©2021 Databricks Inc. — All rights reserved
Connect your data, analytics and AI
tools to the Databricks Lakehouse
Discover validated data and AI
solutions for new use cases
Setup in a few clicks with pre-built
integrations
Integrated out-of-the-box with Partner Connect
Business
Intelligence
ML
Tools
Data
Preparation
Data
Connectors
Solution
Accelerators
Data
Apps
Partners
Discover, connect, and process data, analytics, and AI tools to your lakehouse
©2021 Databricks Inc. — All rights reserved
Databricks thrives within your modern data
stack
Unity Catalog
Delta Lake
All structured and unstructured data
Cloud Data Lake
Data
Warehousing
Data
Engineering
Data Science
and ML
Data
Streaming
BI and Dashboards Data Science
Data Pipelines
Data Governance
Machine Learning
10
Data Ingestion
©2021 Databricks Inc. — All rights reserved
First-class SQL development experience
Query data lake data using
familiar ANSI SQL, and
collaboratively find and share
new insights faster with the
built-in SQL query editor, alerts,
visualizations, and interactive
dashboards.
Collaboratively query, explore, and transform data in-place
©2021 Databricks Inc. — All rights reserved
Elastic, instant compute decoupled from storage
• Quickly setup optimized compute
resources with SQL endpoints
(powered by vectorized engine Photon)
• High concurrency built-in with
automatic load balancing
• Intelligent workload management and
faster reads from cloud storage
• Instant startup and greater availability
• Available in Databricks Serverless
(preview) !
No resource management needed with Serverless
©2021 Databricks Inc. — All rights reserved
Built from the ground up for best price/performance
Source: Performance Benchmark with Barcelona Supercomputing Center
Query and analyze your most complete and freshest data with
up to 12x better price/performance than traditional cloud data warehouses.
Lightning fast analytics
©2021 Databricks Inc. — All rights reserved 15
● Centralized metadata and user
management
● Centralized data access controls
● Data lineage Private Preview
● Data access auditing
● Data search and discovery Coming Soon
● Secure data sharing with Delta Sharing
● Standard SQL
Fine-grained governance on the Lakehouse
Unity Catalog
©2021 Databricks Inc. — All rights reserved
Key considerations for Modern Analytics & DW
❏ Empower Business Units for Self-service and Advanced Analytics
❏ Simple, Collaborative, Agile Cross-Functional teams
❏ Machine Learning and Artificial Intelligence - CIO level initiatives
❏ Platform that support for all data types - structured and
unstructured
❏ Cloud - choose Best of the Breed - Open Tech Stack vs Proprietary
©2021 Databricks Inc. — All rights reserved
Demo
©2021 Databricks Inc. — All rights reserved
Modern Data Warehousing on Databricks
Data Science and
Machine Learning
Databricks Machine Learning
Batch Ingestion
Stream Ingestion
Curated Data
Raw
Ingestion
and History
BRONZE
Filtered,
Cleaned,
Augmented
SILVER
Business
Aggregates &
Data Models
GOLD
Enterprise
Reporting and BI
DBSQL
Endpoints
Databricks SQL
Databricks Notebooks, Delta Live Tables
Select the Ingestion, ETL, Presentation Layer and Governance Ecosystem on the Databricks Platform
ETL Partners
Data Governance powered by Databricks Unity Catalog
EDC
©2022 Databricks Inc. — All rights reserved
Building your
Lakehouse
Comprehensive investment
into your success
20
Supported by 24/7/365 global,
production operations at scale
Your
success
Solution
Accelerators
In-person and
Virtual Training
Co-located
Professional
Services
©2021 Databricks Inc. — All rights reserved
Migration Methodology
21
Phase 1
Discovery
Migration
specific
discovery and
consultation
Phase 2
Assessment
Assessment,
Design, Tooling,
Accelerators,
Sizing, Partners
Phase 3
Strategy
Technology
mapping,
migration
workshop,
migration
planning
Databricks Migration Team with/without Partner
Phase 4
Production Pilot
Reference
implementation
of a production
use case, Overall
migration
implementation
plan
Phase 5
Execution
Migration
execution and
support
Databricks PS Driven
Partner Driven
©2021 Databricks Inc. — All rights reserved
Migration Approach
22
Architecture/
Infrastructure
● Establish
deployment
Architecture
● Implement
Security and
Governance
framework
Data Migration
● Map Data
Structures and
Layout
● Complete One
time load
● Implement
incremental load
approach
ETL and Pipelines
● Migrate Data
transformation
and pipeline
code,
orchestration
and jobs
● Speedup your
migration using
Automation tools
● Validate:
Compare your
results with On
Prem data and
expected results
BI and Analytics
● Re-point reports
and analytics for
Business
Analysts and
Business
Outcomes
● Semantic
Layer/OLAP
cube repointing
● Connect to
reporting and
analytics
applications
Data Science/ML
● Establish
connectivity to
ML Tools
● Onboard Data
Science teams
©2021 Databricks Inc. — All rights reserved
Strategies for Data Migration
One-time loads, catch-up loads , Real-time vs Batch Ingestion
1. Extract from Databases via JDBC ODBC connectors via spark.read.jdbc.. (Parallel ingestion)
1. Extract to Cloud Storage and use Databricks Autoloader for streaming ingest
1. ISV Partners for Real-Time CDC Ingestion ( Arcion, Fivetran, Qlik, Rivery, Streamsets..)
©2021 Databricks Inc. — All rights reserved
Strategies for ETL/Code Migration
Use of Automated tools or frameworks can reduce your timelines by over 50%!
Migration of Stored Procedures and/or ETL Mappings
• For Databricks Notebooks based ETL:
• Delta Live Tables or Databricks Notebook-based ETL
• Metadata-driven Ingestion Frameworks
• ETL tool Partners:
• Matillion, Prophecy, DBT, Informatica, Talend, Infoworks.. many more
• Auto code converters accelerate migrations!
©2022 Databricks Inc. — All rights reserved
Repoint Cubes and Reports to Databricks
• As easy as repointing your reports to DBSQL jdbc/odbc drivers
(Photon and our newest cloudfetch ODBC drivers )
• Key Integrations
• PowerBI Premium ( semantic layers, composite models, upto 400 GB caching)
• Tableau Hyper Extracts
• Looker
• OLAP cube partners like Microstrategy
• Atscale: Universal Semantic layer
( aggs built in Databricks)
Unleash Self-service Analytics with a Semantic Lakehouse
25
©2022 Databricks Inc. — All rights reserved
Key Takeaways..
Migration is a team sport
● Data Warehousing on Lakehouse is simple
● Migrations can be accelerated using automation tools
● Extensive Partner Ecosystem around Databricks Modern Data Stack
● Huge set of joint offerings to accelerate migrations with SI/Consulting
Partners
©2021 Databricks Inc. — All rights reserved
Next Steps
1. Learn more about the Inner Workings of the Lakehouse
1. Schedule a Data Warehouse migration workshop
1. Schedule a Databricks SQL Hands-on workshop
Customize your EDW/ETL Migration Success Plan with an Expert-led Migration
Assessment Workshop
©2021 Databricks Inc. — All rights reserved

Más contenido relacionado

La actualidad más candente

Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationMatthew W. Bowers
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 

La actualidad más candente (20)

Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 

Similar a DW Migration Webinar-March 2022.pptx

Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...HostedbyConfluent
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfIlham31574
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeTorsten Steinbach
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...HostedbyConfluent
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaMarketingArrowECS_CZ
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeDATAVERSITY
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDATAVERSITY
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
VisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyVisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyLeonid Nekhymchuk
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
 
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?MarketingArrowECS_CZ
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartchCloudera, Inc.
 
The new big data
The new big dataThe new big data
The new big dataAdam Doyle
 
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022HostedbyConfluent
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 

Similar a DW Migration Webinar-March 2022.pptx (20)

Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdf
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management Platforma
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
VisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyVisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case study
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
 
The new big data
The new big dataThe new big data
The new big data
 
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 

Más de Databricks

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityDatabricks
 

Más de Databricks (20)

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 

Último

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

DW Migration Webinar-March 2022.pptx

  • 1. ©2021 Databricks Inc. — All rights reserved Modernize your Data Warehouse Amit Kara, Director, Technical Product Marketing Soham Bhatt, SME Lead, DW Migration A migration journey to the Databricks Lakehouse Platform
  • 2. ©2021 Databricks Inc. — All rights reserved Agenda • Why lakehouse for data warehousing • How does Databricks help with Data Warehousing • Key differentiators when using the Databricks Lakehouse Platform • Demo: Data warehousing on Databricks • How to modernize your data warehouse to a Lakehouse • Key takeaways for migrating to the Lakehouse
  • 3. ©2021 Databricks Inc. — All rights reserved What’s the problem we’re solving?
  • 4. ©2021 Databricks Inc. — All rights reserved Legacy Data Warehouses aren’t keeping up Data Warehouses can’t keep up with data volume and variety Innovation hinges on integrating ML/AI and predictive insights Business agility requires reliable, real-time data Not cost effective, especially with scale Data is vendor locked-in and duplicated
  • 5. ©2021 Databricks Inc. — All rights reserved The problem with legacy CDW: a fragmented approach to modernizing your architecture Structured Cloud Data Warehouse Unstructured Semi-Structured DATA LAKE BI Reports, Dashboards & SQL ELT/ETL ADLS AWS S3 GCP Data Science Model Training Model Scoring Model Deployment Limited support for streaming Limited support for unstructured data (audio/images/video) Complex & many stages. Data is duplicated Lock-in / proprietary format Compute cost for all data access Disparate tooling decreases data team productivity
  • 6. ©2021 Databricks Inc. — All rights reserved Why Data Warehousing on Databricks?
  • 7. ©2021 Databricks Inc. — All rights reserved Your tools of choice Use your favorite tools like Fivetran, dbt, PowerBI , Tableau or Databricks to ingest, transform and query all your data in-place. Serverless compute Lower costs and eliminate the need to manage, configure or scale cloud infrastructure with serverless and get the best price/performance. Unified governance Simplify architecture, establish one single copy for all your data, and one unified governance layer across all data teams using standard SQL. Why Data Warehousing on Databricks Unity Catalog Delta Lake All structured and unstructured data Cloud Data Lake Data Warehousing Data Engineering Data Science and ML Data Streaming Break down silos Empower data scientists and analysts to access the most complete and freshest data faster, and uncover new insights together.
  • 8. ©2021 Databricks Inc. — All rights reserved Connect your data, analytics and AI tools to the Databricks Lakehouse Discover validated data and AI solutions for new use cases Setup in a few clicks with pre-built integrations Integrated out-of-the-box with Partner Connect Business Intelligence ML Tools Data Preparation Data Connectors Solution Accelerators Data Apps Partners Discover, connect, and process data, analytics, and AI tools to your lakehouse
  • 9. ©2021 Databricks Inc. — All rights reserved Databricks thrives within your modern data stack Unity Catalog Delta Lake All structured and unstructured data Cloud Data Lake Data Warehousing Data Engineering Data Science and ML Data Streaming BI and Dashboards Data Science Data Pipelines Data Governance Machine Learning 10 Data Ingestion
  • 10. ©2021 Databricks Inc. — All rights reserved First-class SQL development experience Query data lake data using familiar ANSI SQL, and collaboratively find and share new insights faster with the built-in SQL query editor, alerts, visualizations, and interactive dashboards. Collaboratively query, explore, and transform data in-place
  • 11. ©2021 Databricks Inc. — All rights reserved Elastic, instant compute decoupled from storage • Quickly setup optimized compute resources with SQL endpoints (powered by vectorized engine Photon) • High concurrency built-in with automatic load balancing • Intelligent workload management and faster reads from cloud storage • Instant startup and greater availability • Available in Databricks Serverless (preview) ! No resource management needed with Serverless
  • 12. ©2021 Databricks Inc. — All rights reserved Built from the ground up for best price/performance Source: Performance Benchmark with Barcelona Supercomputing Center Query and analyze your most complete and freshest data with up to 12x better price/performance than traditional cloud data warehouses. Lightning fast analytics
  • 13. ©2021 Databricks Inc. — All rights reserved 15 ● Centralized metadata and user management ● Centralized data access controls ● Data lineage Private Preview ● Data access auditing ● Data search and discovery Coming Soon ● Secure data sharing with Delta Sharing ● Standard SQL Fine-grained governance on the Lakehouse Unity Catalog
  • 14. ©2021 Databricks Inc. — All rights reserved Key considerations for Modern Analytics & DW ❏ Empower Business Units for Self-service and Advanced Analytics ❏ Simple, Collaborative, Agile Cross-Functional teams ❏ Machine Learning and Artificial Intelligence - CIO level initiatives ❏ Platform that support for all data types - structured and unstructured ❏ Cloud - choose Best of the Breed - Open Tech Stack vs Proprietary
  • 15. ©2021 Databricks Inc. — All rights reserved Demo
  • 16. ©2021 Databricks Inc. — All rights reserved Modern Data Warehousing on Databricks Data Science and Machine Learning Databricks Machine Learning Batch Ingestion Stream Ingestion Curated Data Raw Ingestion and History BRONZE Filtered, Cleaned, Augmented SILVER Business Aggregates & Data Models GOLD Enterprise Reporting and BI DBSQL Endpoints Databricks SQL Databricks Notebooks, Delta Live Tables Select the Ingestion, ETL, Presentation Layer and Governance Ecosystem on the Databricks Platform ETL Partners Data Governance powered by Databricks Unity Catalog EDC
  • 17. ©2022 Databricks Inc. — All rights reserved Building your Lakehouse Comprehensive investment into your success 20 Supported by 24/7/365 global, production operations at scale Your success Solution Accelerators In-person and Virtual Training Co-located Professional Services
  • 18. ©2021 Databricks Inc. — All rights reserved Migration Methodology 21 Phase 1 Discovery Migration specific discovery and consultation Phase 2 Assessment Assessment, Design, Tooling, Accelerators, Sizing, Partners Phase 3 Strategy Technology mapping, migration workshop, migration planning Databricks Migration Team with/without Partner Phase 4 Production Pilot Reference implementation of a production use case, Overall migration implementation plan Phase 5 Execution Migration execution and support Databricks PS Driven Partner Driven
  • 19. ©2021 Databricks Inc. — All rights reserved Migration Approach 22 Architecture/ Infrastructure ● Establish deployment Architecture ● Implement Security and Governance framework Data Migration ● Map Data Structures and Layout ● Complete One time load ● Implement incremental load approach ETL and Pipelines ● Migrate Data transformation and pipeline code, orchestration and jobs ● Speedup your migration using Automation tools ● Validate: Compare your results with On Prem data and expected results BI and Analytics ● Re-point reports and analytics for Business Analysts and Business Outcomes ● Semantic Layer/OLAP cube repointing ● Connect to reporting and analytics applications Data Science/ML ● Establish connectivity to ML Tools ● Onboard Data Science teams
  • 20. ©2021 Databricks Inc. — All rights reserved Strategies for Data Migration One-time loads, catch-up loads , Real-time vs Batch Ingestion 1. Extract from Databases via JDBC ODBC connectors via spark.read.jdbc.. (Parallel ingestion) 1. Extract to Cloud Storage and use Databricks Autoloader for streaming ingest 1. ISV Partners for Real-Time CDC Ingestion ( Arcion, Fivetran, Qlik, Rivery, Streamsets..)
  • 21. ©2021 Databricks Inc. — All rights reserved Strategies for ETL/Code Migration Use of Automated tools or frameworks can reduce your timelines by over 50%! Migration of Stored Procedures and/or ETL Mappings • For Databricks Notebooks based ETL: • Delta Live Tables or Databricks Notebook-based ETL • Metadata-driven Ingestion Frameworks • ETL tool Partners: • Matillion, Prophecy, DBT, Informatica, Talend, Infoworks.. many more • Auto code converters accelerate migrations!
  • 22. ©2022 Databricks Inc. — All rights reserved Repoint Cubes and Reports to Databricks • As easy as repointing your reports to DBSQL jdbc/odbc drivers (Photon and our newest cloudfetch ODBC drivers ) • Key Integrations • PowerBI Premium ( semantic layers, composite models, upto 400 GB caching) • Tableau Hyper Extracts • Looker • OLAP cube partners like Microstrategy • Atscale: Universal Semantic layer ( aggs built in Databricks) Unleash Self-service Analytics with a Semantic Lakehouse 25
  • 23. ©2022 Databricks Inc. — All rights reserved Key Takeaways.. Migration is a team sport ● Data Warehousing on Lakehouse is simple ● Migrations can be accelerated using automation tools ● Extensive Partner Ecosystem around Databricks Modern Data Stack ● Huge set of joint offerings to accelerate migrations with SI/Consulting Partners
  • 24. ©2021 Databricks Inc. — All rights reserved Next Steps 1. Learn more about the Inner Workings of the Lakehouse 1. Schedule a Data Warehouse migration workshop 1. Schedule a Databricks SQL Hands-on workshop Customize your EDW/ETL Migration Success Plan with an Expert-led Migration Assessment Workshop
  • 25. ©2021 Databricks Inc. — All rights reserved