SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
IBM Cloud Day 2021
Well Architected Data Lake
James Bennett, Offering Manager
Torsten Steinbach, Senior Technical Staff Member
Two Cloud Data Lake Sessions Today
• The Well Constructed Architecture of a Modern Data Lake
• Introductory Session
• What we provide, how you can consume it
• Light introduction to deeper architecture
• Deep Dive into Cloud Native Data Lakes with IBM Cloud
• Session led by Torsten
• Everything you need to know about building a Data Lake on IBM Cloud
• Includes our Covid-19 Data Lake Implementation
3
Organizations need the ability to:
o Visualize data and build data
driven applications
o Increased Data flexibility and
accessibility
o Provide Data governance to
retain data authenticity
o Gain speed with data insights
o Collect, explore and analyze
data
Cloud Data Lake for the
Enterprise
Data Architects Business and Data Analysts
Data scientists and application developers
Cloud Data Lake Evolutionary Context
Enterprise Data
Warehouses
Tightly integrated and
optimized systems
Hadoop
Introduced open data formats &
easy scaling on commodity HW
Cloud-Native: Serverless Analytics-aaS
• Elasticity
• Pay-per-query
• Data in object store
• Disaggregated architecture
• Increasingly real-time first
The 90-ies 2000 Today
5
Need ability to effectively analyze data from from remote locations to
gain insights with cost effective, secure, on demand analytics and long-
term data retention
o Nightly batch export from operational production databases in factory
locations are automatically uploaded to data lake in cloud (central COS
bucket).
o LoB engineers subscribes to data in data lake, which is then ETLed with
SQL query to tenant-specific zones (tenant specific COS buckets).
o Future updates of data lake data in central COS bucket is automatically
ETLed right away to tenant specific COS bucket via cloud functions
events.
o LoB engineers explore, experiment and do data preparation using SQL
query on tenant specific buckets.
o LoB engineer uses Watson Studio to run data science, visualize and
present insights to executives.
Solution
Business Problem
Case Study
6
Need ability to effectively ingest and analyze data from multiple vendors
in various data formats to gain competitive insights
Ø Ingest pricing data from 20+ external vendors and persist in Cloud
Object Store.
Ø Data Engineers prep the data by joining vendor data with on-premise
data warehouse
Ø Data Engineers then process result sets using Analytics Engine (Spark)
and Db2 Warehouse on Cloud.
Ø LoB engineers explore, experiment and do data preparation using SQL
query on tenant specific buckets.
Ø LoB engineer uses Watson Studio (notebooks) to run data science,
visualize and present actionable competitive insights to executives.
Solution
Business Problem
Case Study
Replicate on-prem
DB to cloud data lake
for analytics
o Capture database
change feed into
Kafka in Cloud
o Land Kafka data to
object storage
o Prepare replicated
change feed for
analytics
o Query for insights
o Present & visualize
insights
Collect, historize &
analyze IoT data
o Land IoT message
data through Even
Streams (Kafka)
o Prepare, cleanse,
extract and enrich
IoT data
o Query for insights
o Present & visualize
insights
Move existing
Hadoop Workload to
Cloud
o Replace HDFS with
cloud-native
storage: object
storage
o Run Hadoop
processing in fully
managed Hadoop
service: analytic
engine
o Interactive analytics
through Watson
Studio
AIOps, gain operational
& business insights
from solution logs
o Collect full solution
telemetry (logs)
o Prepare, cleanse,
extract and enrich
data from logs
o Query for insights
o Present & visualize
insights
7
Use Cases
SQL in Place :
Reduce cost and
decouple workload
from DWHs
o Use data lake in as
landing and
preparation storage
before data gets
ingested to DWH
o Archive data from
DWH to data lake
from affordable
SQL-enabled
archive
o Automate ETL and
enable SQL-
federation across
data lake and DWH
Cloud Pak for Data as a Service
Built On
IBM Cloud
Uses
IBM Cloud Data Lake
COS
Storage Analytics
SQL Query
Event Streams
Streaming Transformation
Spark Cloud Databases
Databases
Scalability
Start small and grow
large without
overprovisioning for
anticipated scale.
Efficiency and Speed
Get applications to
market quickly,
without worrying
about underlying
infrastructure costs,
maintenance, and
provider security
Flexibility
Pick and choose
services to fit their
needs, customize
applications and
expand across geos
seamlessly
Security
Common security
integrations with
Identity and Access
Management,
customer managed
encryption key, and
common compliance
roadmap
9
IBM Cloud enables a secure,
fully integrated set of Cloud
Data Services
Managed by IBM/Vendor
Managed by Client
The AI Ladder
Cloud Pak for Data (as a Service) – Deploy Anywhere
Administrator
Analyst
Architect
Developer
Executive
Line of Business Owner
Operations
Systems Integrator
Data Science
Tooling
Streaming
Analytics
Analytical
Dashboards
AI
Applications
Data Prep
Tools
Object Stores
Data Lake
Databases
Unstructured &
Streaming Data
Intelligent
data catalog
Assess Risk
Discover
data
Self-serve find &
‘deploy’ data Data Privacy
enforced
Business meaning
Data
Consumers
Hybrid
Data
Sources
Integrated
Data
Governance
• Extract greater value from your data assets through
better data organization and intelligent data
discovery
• Enable AI to help you derive better insights from
your organized data
• Improve data risk strategies by assessing risks
across your data estates
• Increase user productivity through safe self-service
data access
• Unified end-user experience driven by seamlessly
integrated services across the platform
12
Enable safe self-service access to data across users with multiple skill levels enabling them
to use the power of AI securely at speed
Key Business Outcome: DataOps
Cloud Pak for Data as a Service
Built On
IBM Cloud
Uses
IBM Cloud Data Lake
COS
Storage Analytics
SQL Query
Event Streams
Streaming Transformation
Spark Cloud Databases
Databases
Industry-leading
optimizations for SQL-
native location &
timeseries data and
indexing of object storage
data
High velocity due to self-
service data management,
preparation & analytics
with extreme low barrier
of entry thanks to
serverless model
Most secure data lake
option in cloud due unique
BYO and KYOK key
services in IBM Cloud.
Enables Cloud Economics,
Resiliency and Scale for
Big Data
14
Why IBM Cloud Data Lake?
IBM Cloud Data Lake
Architecture
Telemetry Data
Explore
ETL
Prep Enrich
Streaming
Optimize Analyze
ü Seamless Elasticity
ü Seamless Scalability
ü Highly Cost Effective
ü Long Term Retention
ü Any data formats
ETL
IBM Cloud Data Lake – Big Picture
DWH
Databases
ü Response Time SLAs
ü Warm High-quality Data only
Cloud Data Lake
Analytics
Optional:
IBM Serverless Stack for Analytics
Serverless
Storage
Serverless
Runtimes
Serverless
Analytics
Object
Storage
Cloud
Functions
Query
Only pay for volume of data
that you really store
Only pay for
amount of
data that you
really scan
Only pay for
CPU that
you really
consume
Blog Article
§ Properties of Serverless:
– No management of resources, hosts and
processes
– Auto-scaling and auto-provisioning based
on actual load
– Precise billing based on really consumed
system resources (memory, storage, CPU,
network, I/O)
– High-Availability is always implicit
IBM SQL Query – The Central Cloud Data Lake Service
Cloud Data
Data
Transformation
Serverless SQL Query Service
Analytics
Object
Storage RDBMS
+
Developers
Data
Engineers
Data Analysts
ü Supports ad-hoc and
unknown data structures
ü ETL & ELT Support
ü 100% Pay-as-you-go (5$/TB)
ü 100% API enabled
ü Automatic Big Data Scale-
Out with Spark
ü 100% Self service, No Setup
Data
Management
+
Data Scientists
ü Built-In Database Catalog &
Data Skipping
Data Ingestion
+
IBM SQL Query Architecture
2. Read data
4. Read
results
Application
3. Write data
Cloud Data Services
1. Submit SQL
SQL
Event Streams
Query
Db2 on Cloud
Geospatial SQL
Data Skipping
Timeseries SQL
Hive Metastore
Video
Cloud Object Storage
• Using IBM Analytic Engine service
(Spark clusters aaS)
• Large farm of Spark clusters auto-
provisioned & auto-managed in background
• Managing a hot pool of Spark applications
(a.k.a. kernels, using Jupyter Kernel Gateway)
• SQL grammar sandbox
• Auto-scaling of each serverless SQL job
inside large Spark clusters using dynamic
resource allocation
• Intrinsically HA (dispatching across Spark
environments in each availability zone)
IBM SQL Query – Access Patterns
Create
Query
SQL
Console
Watson
Studio
Notebooks
Cloud Functions
Integrate Explore
Deploy
Python SDK
REST API
JDBC
Object
Store
Console
Event
Streams
Console
Meta Data
IBM Cloud Data Lake – Meta Data
Cloud Data
ACID
Spark
Data Skipping Indexes Governance Policies
& Lineage
Schema, Partitioning,
Statistics
Serverless SQL
Object
Storage RDBMS
Hive
Metastore
Kafka Schema
Registry
Xskipper Iceberg
Watson Knowledge
Catalog
Deltalake
Event Streams SQL Query
Object
Storage Meta Data
Integrated Hive Metastore + Kafka Schema Registry + ACID (Iceberg)
Real-Time
Queries
IBM Cloud Data Lake – 2021 Architecture
COS
Batch
Queries
Stream Xform
& Joins
Stream data landing
Schema management & enforcement
ETL & Data
Preparation
IBM Cloud Data Lake Deep Dive

Más contenido relacionado

La actualidad más candente

IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeTorsten Steinbach
 
Spark Streaming with Azure Databricks
Spark Streaming with Azure DatabricksSpark Streaming with Azure Databricks
Spark Streaming with Azure DatabricksDustin Vannoy
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AITorsten Steinbach
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannDatabricks
 
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Microsoft Tech Community
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure DatabricksDustin Vannoy
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaDatabricks
 
Add Historical Analysis of Operational Data with Easy Configurations in Fivet...
Add Historical Analysis of Operational Data with Easy Configurations in Fivet...Add Historical Analysis of Operational Data with Easy Configurations in Fivet...
Add Historical Analysis of Operational Data with Easy Configurations in Fivet...Databricks
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lakeMykola Zerniuk
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data FactoryBizTalk360
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine LearningMark Tabladillo
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2inovex GmbH
 

La actualidad más candente (20)

IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data Lake
 
Spark Streaming with Azure Databricks
Spark Streaming with Azure DatabricksSpark Streaming with Azure Databricks
Spark Streaming with Azure Databricks
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AI
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
 
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure Databricks
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
 
Add Historical Analysis of Operational Data with Easy Configurations in Fivet...
Add Historical Analysis of Operational Data with Easy Configurations in Fivet...Add Historical Analysis of Operational Data with Easy Configurations in Fivet...
Add Historical Analysis of Operational Data with Easy Configurations in Fivet...
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data Factory
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 

Similar a IBM Cloud Data Lake Deep Dive

ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
 
Unlocking the Power of the Data Lake
Unlocking the Power of the Data LakeUnlocking the Power of the Data Lake
Unlocking the Power of the Data LakeArcadia Data
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESBig Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESMatt Stubbs
 
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeDATAVERSITY
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesA Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesArcadia Data
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudIBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudTorsten Steinbach
 

Similar a IBM Cloud Data Lake Deep Dive (20)

Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Unlocking the Power of the Data Lake
Unlocking the Power of the Data LakeUnlocking the Power of the Data Lake
Unlocking the Power of the Data Lake
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESBig Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
 
Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
 
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesA Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudIBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
 

Más de Torsten Steinbach

IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services Torsten Steinbach
 
Coud-based Data Lake for Analytics and AI
Coud-based Data Lake for Analytics and AICoud-based Data Lake for Analytics and AI
Coud-based Data Lake for Analytics and AITorsten Steinbach
 
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?Torsten Steinbach
 
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM CloudIBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM CloudTorsten Steinbach
 
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL Torsten Steinbach
 
IBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionIBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionTorsten Steinbach
 
IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud
IBM Insight 2014 - Advanced Warehouse Analytics in the CloudIBM Insight 2014 - Advanced Warehouse Analytics in the Cloud
IBM Insight 2014 - Advanced Warehouse Analytics in the CloudTorsten Steinbach
 
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudIBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudTorsten Steinbach
 
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter AnalysisIBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter AnalysisTorsten Steinbach
 
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...Torsten Steinbach
 
IBM Information on Demand 2013 - Session 2839 - Using IBM PureData System fo...
IBM Information on Demand 2013  - Session 2839 - Using IBM PureData System fo...IBM Information on Demand 2013  - Session 2839 - Using IBM PureData System fo...
IBM Information on Demand 2013 - Session 2839 - Using IBM PureData System fo...Torsten Steinbach
 
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892Torsten Steinbach
 

Más de Torsten Steinbach (12)

IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
 
Coud-based Data Lake for Analytics and AI
Coud-based Data Lake for Analytics and AICoud-based Data Lake for Analytics and AI
Coud-based Data Lake for Analytics and AI
 
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
 
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM CloudIBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
 
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
 
IBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionIBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query Introduction
 
IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud
IBM Insight 2014 - Advanced Warehouse Analytics in the CloudIBM Insight 2014 - Advanced Warehouse Analytics in the Cloud
IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud
 
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudIBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
 
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter AnalysisIBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
 
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
 
IBM Information on Demand 2013 - Session 2839 - Using IBM PureData System fo...
IBM Information on Demand 2013  - Session 2839 - Using IBM PureData System fo...IBM Information on Demand 2013  - Session 2839 - Using IBM PureData System fo...
IBM Information on Demand 2013 - Session 2839 - Using IBM PureData System fo...
 
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
 

Último

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Último (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

IBM Cloud Data Lake Deep Dive

  • 1. IBM Cloud Day 2021 Well Architected Data Lake James Bennett, Offering Manager Torsten Steinbach, Senior Technical Staff Member
  • 2. Two Cloud Data Lake Sessions Today • The Well Constructed Architecture of a Modern Data Lake • Introductory Session • What we provide, how you can consume it • Light introduction to deeper architecture • Deep Dive into Cloud Native Data Lakes with IBM Cloud • Session led by Torsten • Everything you need to know about building a Data Lake on IBM Cloud • Includes our Covid-19 Data Lake Implementation
  • 3. 3 Organizations need the ability to: o Visualize data and build data driven applications o Increased Data flexibility and accessibility o Provide Data governance to retain data authenticity o Gain speed with data insights o Collect, explore and analyze data Cloud Data Lake for the Enterprise Data Architects Business and Data Analysts Data scientists and application developers
  • 4. Cloud Data Lake Evolutionary Context Enterprise Data Warehouses Tightly integrated and optimized systems Hadoop Introduced open data formats & easy scaling on commodity HW Cloud-Native: Serverless Analytics-aaS • Elasticity • Pay-per-query • Data in object store • Disaggregated architecture • Increasingly real-time first The 90-ies 2000 Today
  • 5. 5 Need ability to effectively analyze data from from remote locations to gain insights with cost effective, secure, on demand analytics and long- term data retention o Nightly batch export from operational production databases in factory locations are automatically uploaded to data lake in cloud (central COS bucket). o LoB engineers subscribes to data in data lake, which is then ETLed with SQL query to tenant-specific zones (tenant specific COS buckets). o Future updates of data lake data in central COS bucket is automatically ETLed right away to tenant specific COS bucket via cloud functions events. o LoB engineers explore, experiment and do data preparation using SQL query on tenant specific buckets. o LoB engineer uses Watson Studio to run data science, visualize and present insights to executives. Solution Business Problem Case Study
  • 6. 6 Need ability to effectively ingest and analyze data from multiple vendors in various data formats to gain competitive insights Ø Ingest pricing data from 20+ external vendors and persist in Cloud Object Store. Ø Data Engineers prep the data by joining vendor data with on-premise data warehouse Ø Data Engineers then process result sets using Analytics Engine (Spark) and Db2 Warehouse on Cloud. Ø LoB engineers explore, experiment and do data preparation using SQL query on tenant specific buckets. Ø LoB engineer uses Watson Studio (notebooks) to run data science, visualize and present actionable competitive insights to executives. Solution Business Problem Case Study
  • 7. Replicate on-prem DB to cloud data lake for analytics o Capture database change feed into Kafka in Cloud o Land Kafka data to object storage o Prepare replicated change feed for analytics o Query for insights o Present & visualize insights Collect, historize & analyze IoT data o Land IoT message data through Even Streams (Kafka) o Prepare, cleanse, extract and enrich IoT data o Query for insights o Present & visualize insights Move existing Hadoop Workload to Cloud o Replace HDFS with cloud-native storage: object storage o Run Hadoop processing in fully managed Hadoop service: analytic engine o Interactive analytics through Watson Studio AIOps, gain operational & business insights from solution logs o Collect full solution telemetry (logs) o Prepare, cleanse, extract and enrich data from logs o Query for insights o Present & visualize insights 7 Use Cases SQL in Place : Reduce cost and decouple workload from DWHs o Use data lake in as landing and preparation storage before data gets ingested to DWH o Archive data from DWH to data lake from affordable SQL-enabled archive o Automate ETL and enable SQL- federation across data lake and DWH
  • 8. Cloud Pak for Data as a Service Built On IBM Cloud Uses IBM Cloud Data Lake COS Storage Analytics SQL Query Event Streams Streaming Transformation Spark Cloud Databases Databases
  • 9. Scalability Start small and grow large without overprovisioning for anticipated scale. Efficiency and Speed Get applications to market quickly, without worrying about underlying infrastructure costs, maintenance, and provider security Flexibility Pick and choose services to fit their needs, customize applications and expand across geos seamlessly Security Common security integrations with Identity and Access Management, customer managed encryption key, and common compliance roadmap 9 IBM Cloud enables a secure, fully integrated set of Cloud Data Services
  • 10. Managed by IBM/Vendor Managed by Client The AI Ladder Cloud Pak for Data (as a Service) – Deploy Anywhere
  • 12. Data Science Tooling Streaming Analytics Analytical Dashboards AI Applications Data Prep Tools Object Stores Data Lake Databases Unstructured & Streaming Data Intelligent data catalog Assess Risk Discover data Self-serve find & ‘deploy’ data Data Privacy enforced Business meaning Data Consumers Hybrid Data Sources Integrated Data Governance • Extract greater value from your data assets through better data organization and intelligent data discovery • Enable AI to help you derive better insights from your organized data • Improve data risk strategies by assessing risks across your data estates • Increase user productivity through safe self-service data access • Unified end-user experience driven by seamlessly integrated services across the platform 12 Enable safe self-service access to data across users with multiple skill levels enabling them to use the power of AI securely at speed Key Business Outcome: DataOps
  • 13. Cloud Pak for Data as a Service Built On IBM Cloud Uses IBM Cloud Data Lake COS Storage Analytics SQL Query Event Streams Streaming Transformation Spark Cloud Databases Databases
  • 14. Industry-leading optimizations for SQL- native location & timeseries data and indexing of object storage data High velocity due to self- service data management, preparation & analytics with extreme low barrier of entry thanks to serverless model Most secure data lake option in cloud due unique BYO and KYOK key services in IBM Cloud. Enables Cloud Economics, Resiliency and Scale for Big Data 14 Why IBM Cloud Data Lake?
  • 15. IBM Cloud Data Lake Architecture
  • 16. Telemetry Data Explore ETL Prep Enrich Streaming Optimize Analyze ü Seamless Elasticity ü Seamless Scalability ü Highly Cost Effective ü Long Term Retention ü Any data formats ETL IBM Cloud Data Lake – Big Picture DWH Databases ü Response Time SLAs ü Warm High-quality Data only Cloud Data Lake Analytics Optional:
  • 17. IBM Serverless Stack for Analytics Serverless Storage Serverless Runtimes Serverless Analytics Object Storage Cloud Functions Query Only pay for volume of data that you really store Only pay for amount of data that you really scan Only pay for CPU that you really consume Blog Article § Properties of Serverless: – No management of resources, hosts and processes – Auto-scaling and auto-provisioning based on actual load – Precise billing based on really consumed system resources (memory, storage, CPU, network, I/O) – High-Availability is always implicit
  • 18. IBM SQL Query – The Central Cloud Data Lake Service Cloud Data Data Transformation Serverless SQL Query Service Analytics Object Storage RDBMS + Developers Data Engineers Data Analysts ü Supports ad-hoc and unknown data structures ü ETL & ELT Support ü 100% Pay-as-you-go (5$/TB) ü 100% API enabled ü Automatic Big Data Scale- Out with Spark ü 100% Self service, No Setup Data Management + Data Scientists ü Built-In Database Catalog & Data Skipping Data Ingestion +
  • 19. IBM SQL Query Architecture 2. Read data 4. Read results Application 3. Write data Cloud Data Services 1. Submit SQL SQL Event Streams Query Db2 on Cloud Geospatial SQL Data Skipping Timeseries SQL Hive Metastore Video Cloud Object Storage • Using IBM Analytic Engine service (Spark clusters aaS) • Large farm of Spark clusters auto- provisioned & auto-managed in background • Managing a hot pool of Spark applications (a.k.a. kernels, using Jupyter Kernel Gateway) • SQL grammar sandbox • Auto-scaling of each serverless SQL job inside large Spark clusters using dynamic resource allocation • Intrinsically HA (dispatching across Spark environments in each availability zone)
  • 20. IBM SQL Query – Access Patterns Create Query SQL Console Watson Studio Notebooks Cloud Functions Integrate Explore Deploy Python SDK REST API JDBC Object Store Console Event Streams Console
  • 21. Meta Data IBM Cloud Data Lake – Meta Data Cloud Data ACID Spark Data Skipping Indexes Governance Policies & Lineage Schema, Partitioning, Statistics Serverless SQL Object Storage RDBMS Hive Metastore Kafka Schema Registry Xskipper Iceberg Watson Knowledge Catalog Deltalake
  • 22. Event Streams SQL Query Object Storage Meta Data Integrated Hive Metastore + Kafka Schema Registry + ACID (Iceberg) Real-Time Queries IBM Cloud Data Lake – 2021 Architecture COS Batch Queries Stream Xform & Joins Stream data landing Schema management & enforcement ETL & Data Preparation