SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
Serverless SQL
Torsten Steinbach
@torsstei
IBM
1
SQL on Object
Storage
DM Gartner
Hype Cycle
2018
Evolution of Form Factors
For Big Data Analytics
Enterprise Data
Warehouses
Tightly integrated and
optimized systems
Hadoop
Introduced open data formats & easy
scaling on commodity HW
Cloud-Native:
Serverless Analytics-aaS
• Seamless elasticity
• Pay-per-query consumption
• Analyze data as it sits in an object store
• Disaggregated architecture
• No more infrastructure head aches
The 90-ies 2000 Today
Ingredient 3: Serverless Data Transformation
Ingredient 4: Serverless Analytics
Ingredient 5: Serverless Automation
Ingredient 2: Serverless Data Ingest
Sharing Economy for Analytics
Ingredient 1: Serverless Storage
Object Storage
IBM Cloud Object Storage
Objects
Objects
Objects
At Rest
On the Wire
Buckets
Encrypted
Pennies per GB
REST
Elastic
Durable
Flexible
Resiliency Choices
Storage Classes
User Managed
Encryption Keys
S3 Compatible
High Speed Data
Transfer
Aspera
SQL Queries
Data Ingest Options
6
High Customizability
Degree of Serverless-ness
IBM Event Streams
(Kafka aaS)
IBM Cloud Functions
Out-of-the-Box
IBM Streaming Analytics
(IBM Streams aaS)
via Cloud Object Storage API
SQL Query ETL
Cloudant Replication
Blockchain Synch
Cloud Data
Data
Transformation
Serverless SQL
Analytics
IBM SQL Query
Object
Storage
Db2
+
Developers
Data
Engineers
Data Analysts
ü Perfect for Machine Generated Data
ü Ad-hoc Data Exploration
ü Operationalizing Data Pipelines
ü Big Data Lakes
ü Flexible Data Transformation
ü Extremely affordable. 5$/TB scanned
ü 100% API enabled
ü Analytics on Object Storage
ü Big Data Scale-Out. Running on Spark
ü 100% Self service – No Setup
2. Read data
4. Read
results
Application
3. Write results
IBM Cloud
Object Storage
Result SetData Set
Data Set
Data Set
1. Submit SQL
SQL
Archive / Export
IBM Cloud Streaming
IBM Streams
Event Streams
Land
Query
IBM Cloud Functions
IBM SQL Query
Architecture
IBM Cloud Databases
Db2 on Cloud
Geospatial SQLData Skipping
Timeseries SQL
Upload
Data Center 2
Analytics Engine Cluster
20 Kernels
Node 1
Node 3
Node 2
Node 3
…
20
Kernels
…
Data Center 3
Analytics Engine Cluster
20 Kernels
Node 1
Node 3
Node 2
Node 3
…
20
Kernels
…
SQL 1 SQL 1
Data Center 1
IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018)
Analytics Engine Cluster
20 Kernels
Cluster
Pool
Request Queue
Node 1
Node 3
Node 2
Node 3
…
Kernel
Pools
20
Kernels
…
SQL 1 SQL 2 SQL 3 SQL 4 SQL 5
Cloud Object Storage
SQL 6 …
JKG (Web Sockets)
IBM Cloud Query – Spark Cluster Architecture
SQL REST API
Create
Query
SQL Web Console
Watson
Studio
Notebooks
SQL Cloud Function
Integrate Explore
Deploy
IBM Cloud Query – Access Patterns
Node SDK
Python SDK
JDBCLooker
Best of breed Spark SQL Reference
• Complete, intuitive and interactive SQL Reference
• Each sample SQL can immediately be executed as is
https://cloud.ibm.com/docs/services/sql-query/sqlref/sql_reference.html#sql-reference
Analytics using full Power of Spark SQL
IBM SQL Query – Timeseries SQL 1/2
§ Intuitive first-of-a-kind SQL extensions for timeseries operations
§ Industry leading differentiators, including:
• Timeseries transformation functions:
• Correlation, Fourier transformation,
z-normalization, Granger, interpolation,
and distances
• Temporal Joins: SQL support for
Left/Right/Full Inner and Outer joins
of multiple timeseries
Alignment & Joining:
§ Further Industry leading differentiators
• Numerical and categorical timeseries types
• Timeseries data skipping for fast queries
• Forecasting:
• ARIMA, BATS, Anomaly detection, etc.
• Subsequence Mining:
• Train & match models for event sequences
• Segmentation:
• Time-based, Record-based, Anchor-based, Burst, and silence
Segmentation:
IBM SQL Query – Timeseries SQL 2/2
• IBM SQL Query – Spatial SQL
§ SQL/MM standard to store & analyze spatial data in RDBMS
§ Migration of PostGIS compliant SQL queries
§ Aggregation, computation and join via native SQL syntax
§ Industry leading differentiators
• Geodetic Full Earth support
• Increased developer productivity
• Avoid piece-wise planar projections
• High precision calculations anywhere on the earth
• Very large polygons (e.g. countries), polar caps, x-ing anti-meridian
• Spatial data skipping for fast queries
• Native and fine-granular geohash support
• Fast spatial aggregation
Example: Spatio-Temporal Processing of Sensor Data
IBM Cloud Object Storage
Sensor
Data
Query
Location
Analytics
Mobile
Cars
Devices
Land
Location
Filtering
Spatial
Aggregation
GPS
SQL/MM
Sensor
Metrics
t
t
t
Timeseries
Assembly
Timeseries
Join
Timeseries SQL
t
Serverless
Storage
Serverless
Runtimes
Serverless
Analytics
Object
Storage
Cloud
Functions
Query
A Completely Serverless Stack for Data & Analytics Solutions
Unstructured Data Prep
SQL Query
Cloud
Functions
Analyze
COSCOS
Extract Features
Automated/Scheduled SQL Execution
SQL Query
Cloud
Functions
Develop SQL Deploy as SQL Cloud Function
Set up Cloud
Function
Trigger/Schedule
Shield Data From Direct Access
SQL Query
Cloud
Functions
Deploy Cloud Function
with COS API Key
User Calls
Function to
Access Data
COS
Grant Execute on SQL
Cloud Function to User
Configure SQL Pipelines
SQL Query
Cloud
Functions
User creates function
sequence to automate flow
of consecutive SQLs
Sequence
SQL Query
Cloud
Functions
1.
2.
Use Cases of Cloud Functions Adding Value to SQL
Ingredient 3: Serverless Data Transformation ✓
Ingredient 4: Serverless Analytics ✓
Ingredient 5: Serverless Automation ✓
Ingredient 2: Serverless Data Ingest ✓
Ingredient 1: Serverless Storage ✓
Now, what is this all good for?
IBM Cloud Object Storage
Acquire
Query
Data Warehouses &
Databases
Db2 on Cloud
Process Analyze
ApplicationsApplications
Applications
IoT
Streaming
Devices
Devices
Devices
BI & AI
Land
Log Messages
Cleanse
Filter
Merge
Aggregate
Compress
Watson Studio
Looker
Cognos
WML
Explore
Analyze Analyze
Promote
Use for Data Pipelines to fuel BI & AI
Data –Driven Decisions
☛ Understanding system health, user behavior & workload status
Collecting & Analyzing Log Data
☛ Is NOT and afterthought but rather foundation for decisions on
system and feature design.
Data Volume Growing Rapidly
☛ Growth rates and data volume at rest can jump dramatically. Very
high elasticity is required.
Competitive Advantage
☛ Is based on short runways for turning data into actions
Turn your Logs into Business – Log Data Is The Cloud-Native Currency
Logs
Your Cloud
Application/Solution
IBM Cloud Object Storage
Query
Transform
Compress
Aggregate
Repartition
Analyze
Anomaly Detection
User Segmentation
Customer Support
Resource Planning
• Build & run data pipelines and analytics of your log message data
• Flexible log data analytics with full power of SQL
• Seamless scalability & elasticity according to your log message volume
Use for analyzing application logs
IDUG Db2 Tech Conference
Charlotte, NC | June 2 – 6, 2019
Data Lake in IBM Cloud – How it works
IBM Cloud Data LakeData
Streaming
Upload
ETL
DB2
Feature
Extraction
Data
Prep
ICD
DB2
ICD
OLAP
Analytics WML
ETL
Federate
Asper
a
Cloudant
Replication
Secure
Sync
IBM
Blockchain
Application
s
Application
s Watson
Studio
Knowledge
Catalog
METASTORE
AI
ICP for DataAnalytics
Engine
IBM Cloud
Functions
Land Process Integrate
Key Protect
Index
Creation
Getting started: https://www.ibm.com/cloud/sql-query
SQL Query Intro Video: https://youtu.be/s-FznfHJpoU
SQL Query Starter Notebook in Watson Studio: https://ibm.biz/BdYNrN
SQL Reference: https://ibm.biz/Bd2jF7
SQL Query API doc: https://cloud.ibm.com/apidocs/sql-query
Big Data Layout Best Practices for COS: https://ibm.biz/Bd2jRg
Serverless Data & Analytics: https://ibm.biz/Bd2jF5
Further Resources
Backup
IDUG Db2 Tech Conference
Charlotte, NC | June 2 – 6, 2019
1. Identify friction points in users’ digital journey, e.g.:
• Clicks-2-purchase ratio
• Unexpected repeated page visits per user
• E.g. entering payment data should only happen once
• Last page visited per session
2. Identify click sequences for successful purchase
• Sequence matching using timeseries analysis
3. Identify customers/segments likely to churn or expand
• Look for typical page visits, actions or flows
• E.g. Terms & conditions, invite additional users etc.
4. Determining your most important content online
What Insights can I extract from a Clickstream?
1. Identify friction points in users’ digital journey, e.g.:
• Clicks-2-purchase ratio
• Unexpected repeated page visits per user
• E.g. entering payment data should only happen once
• Last page visited per session
2. Identify click sequences for successful purchase
• Sequence matching using timeseries analysis
3. Identify customers/segments likely to churn or expand
• Look for typical page visits, actions or flows
• E.g. Terms & conditions, invite additional users etc.
4. Determining your most important content online
What Insights can I extract from a Clickstream?
Building IBM Cloud-Native Data Lake
Serverless SQL
Serverless Storage
Serverless Pipeline
Automation ✓
✓
✓
Orchestration
Processing
Persistency Data Ingest
✓
Data Catalog ✓
Serverless
Unstructured Data
Processing ✓
• Traditional analytics systems
• Fixed capacities of appliances
• Specialized teams of data engineers & DBAs who manage data model, access and ETL
• BI analysts who have access only to the curated data sets in EDW
• Innovative enterprises today
• Wide range of teams that require direct access to same data set at all stages of the data
pipeline: BI analysts, data scientists, quantitative marketers, dev/ops, developers
• Data engineers that support these teams need a much, much more scalable and cost-
effective platform to ensure all teams have access they need and when needed
• Building analytics platforms in the cloud because of the scale and cost-efficiencies that
come with serverless analytics over object stores
Serverless – The key to IT Sharing Economy ... also for Analytics
Proper data organization è
better performance and lower cost
29
,
2
0
1
9
/
©
2
0
1
9
I
B
M
C
o
r
p
o
r
a
t
i
o
n
The key factors are:
• Number of bytes shipped
• Number of REST requests
Best practices for structured data:
• Choose the right object size (sweet spot: 128 MB)
• Choose the right format
• Choose the right data layout
• Avoid gzip compressed formats
Applies to SQL Query but also
applies to other Big Data engines
To learn more: https://www.ibm.com/blogs/bluemix/2018/06/big-data-layout/
Which Format is Query-Friendly?
2. Use Hive style partitioning
GPMeterStream/dt=2017-08-17/part-00085.csv
GPMeterStream/dt=2017-08-17/part-00086.csv
GPMeterStream/dt=2017-08-17/part-00087.csv
GPMeterStream/dt=2017-08-17/part-00088.csv
GPMeterStream/dt=2017-08-17/part-00089.csv
GPMeterStream/dt=2017-08-18/part-00001.csv
GPMeterStream/dt=2017-08-18/part-00002.csv
GPMeterStream/dt=2017-08-18/part-00003.csv
Avoid reading unnecessary objects altogether
Technique has limitations
Best Practice: minimize bytes scanned
1. Use Parquet
• Column based
• Only read the columns you need
• Column wise compression
• Min/max metadata
Table Locators
cos://<endpoint>/<bucket>/[<prefix>] <format definition>
Endpoint – of your object storage bucket or a short alias
E.g. s3.us-south.objectstorage.appdomain.cloud or alias us-south
Bucket – name in object storage
Prefix – one or multiple objects (i.e. table partitions) with same prefix
Used in FROM clauses for input data and in target field for result set data
Examples:
cos://us-south/myBucket/myFolder/mySubFolder/myData.parquet
cos://us/otherBucket/myData
cos://us/otherBucket/myData/part
cos://eu/newBucket/
<Table Locator> [JOBPREFIX JOBID | NONE]
[STORED AS CSV | PARQUET | JSON]
• Specifies the data format of the input data
• Table schema is automatically inferred at SQL execution time
• STORED AS Clause is optional, the default is CSV
• Additional parameters for CSV:
• E.g.: FIELDS TERMINATEY BY ‘t’ NOHEADER
• JOBPREFIX only for targets: defines unique prefix to append. Default is JOBID.
Table Format Definition
SELECT … INTO
<Table Locator> [STORED AS CSV | PARQUET | JSON]
[PARTITIONED [BY (<column list>)]
[INTO <num> BUCKETS]
[EVERY <num> ROWS]]
[SORT BY (<column list>)]
BY: Produces Hive Style Partitioning
INTO: Produced fix number of partitions (hash partitioned)
EVERY: Produces partitioned of even size (e.g. for pagination)
SORT BY: Exact result order & clustering when combined with PARTITIONED
Table Partitioning Definition
Submit a SQL query
POST https://api.sql-query.cloud.ibm.com/v2/sql_jobs
Runs the SQL in the background and returns a job_id
Detailed info for a SQL query (e.g. status, result location)
GET https://api.sql-query.cloud.ibm.com /v2/sql_jobs/{job_id}
Returns JSON with query execution details
List of recent SQL query executions
GET https://api.sql-query.cloud.ibm.com /v2/sql_jobs
Returns JSON array with last 30 SQL submissions and outcomes
IBM SQL Query REST API
IDUG Db2 Tech Conference
Charlotte, NC | June 2 – 6, 2019
Scaling Analytics: Data Skipping Saving you Time and $
Index All
Objects
IBM Cloud Object Storage
Data Set Objects
SQL
Query
Data Skipping
Indexing
Candidate
Objects
WHERE Clause
Saving Time
and $
SQL Query learns which objects are not relevant to a query
using a data skipping index
CREATE METAINDEX stores index summary metadata for
each object. Much smaller than the data.
SQLs skipping irrelevant objects to significantly reduce I/O
E.g.:
Independent of data formats
Index Types: Min/Max, Value List, Bounding Box
Get location and time of heat waves (>40 celcius)
SELECT lat, long, city, temp, date
FROM weather
WHERE temp > 40.0
Scaling Analytics: Data Skipping Saving you Time and $
Index All
Objects
IBM Cloud Object Storage
Data Set Objects
SQL
Query
Data Skipping
Indexing
Candidate
Objects
WHERE Clause
Saving Time
and $
SQL Query learns which objects are not relevant to a query
using a data skipping index
CREATE METAINDEX stores index summary metadata for
each object. Much smaller than the data.
SQLs skipping irrelevant objects to significantly reduce I/O
E.g.:
Independent of data formats
Index Types: Min/Max, Value List, Bounding Box
Get location and time of heat waves (>40 celcius)
SELECT lat, long, city, temp, date
FROM weather
WHERE temp > 40.0
• JDBC compliant driver library that wraps REST API
• Wrapping both, SQL Query and COS REST API
• Exposing regular session interface (JDBC Connection)
• Enabling custom JDBC application support
• Enabling BI application support
• Early adopter: Looker
• Support for stored table meta data (simple catalog)
• Stored as json in COS and referenced via JDBC
connection string
• I.e. DatabaseMetaData interface also supported
JDBC Driver for BI Applications
Apply for Beta Now
Query
JDBC Driver
REST
COS
JDBC
API
DataResult
Sets
Table
Catalog
E.g. Looker
Using SQL Query JDBC Driver
Define table catalog
• JSON file in COS containing:
• Table name
• Location of table objects on COS
• Object format
• Column names
• Column types
• INT, FLOAT, VARCHAR, TIMESTAMP
JDBC Connection String:
jdbc:SQLQuery:<sql-query instance crn>
?schemabucket=<COS bucket with json catalog>
?schemafile=<COS object with json catalog>
&apikey=<api key for your account>
&targetcosurl=<COS URL for result set>
Think 2019 / 2263 / February 2019 / © 2019 IBM Corporation
IBM Cloud Functions
Fair Never pay for idle
Polyglot
Elastic
Automation
Triggers
Open Source
CLOUD
FUNCTIONS
Schedules
Sequences

Más contenido relacionado

La actualidad más candente

Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Is there a way that we can build our Azure Synapse Pipelines all with paramet...Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Is there a way that we can build our Azure Synapse Pipelines all with paramet...Erwin de Kreuk
 
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsMike Broberg
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkIke Ellis
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong
 
Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Amazon Web Services
 
Moving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed InstanceMoving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed InstanceThomas Sykes
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Microsoft Tech Community
 
Caching with DynamoDB and DAX - DevDay Austin 2017 Day 2
Caching with DynamoDB and DAX - DevDay Austin 2017 Day 2Caching with DynamoDB and DAX - DevDay Austin 2017 Day 2
Caching with DynamoDB and DAX - DevDay Austin 2017 Day 2Amazon Web Services
 
Accessing Google Cloud APIs
Accessing Google Cloud APIsAccessing Google Cloud APIs
Accessing Google Cloud APIswesley chun
 
Data Design for Microservices - DevDay Austin 2017 Day 2
Data Design for Microservices - DevDay Austin 2017 Day 2Data Design for Microservices - DevDay Austin 2017 Day 2
Data Design for Microservices - DevDay Austin 2017 Day 2Amazon Web Services
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Lucas Jellema
 
Bi and AI updates in the Microsoft Data Platform stack
Bi and AI updates in the Microsoft Data Platform stackBi and AI updates in the Microsoft Data Platform stack
Bi and AI updates in the Microsoft Data Platform stackIvan Donev
 
Discovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersDiscovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersIvan Donev
 
Microsoft SQL server 2017 Level 300 technical deck
Microsoft SQL server 2017 Level 300 technical deckMicrosoft SQL server 2017 Level 300 technical deck
Microsoft SQL server 2017 Level 300 technical deckGeorge Walters
 
Discovery Day 2019 Sofia - What is new in SQL Server 2019
Discovery Day 2019 Sofia - What is new in SQL Server 2019Discovery Day 2019 Sofia - What is new in SQL Server 2019
Discovery Day 2019 Sofia - What is new in SQL Server 2019Ivan Donev
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsAmazon Web Services
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseEric Bragas
 
PaaSport to Paradise: Lifting & Shifting with Azure SQL Database/Managed Inst...
PaaSport to Paradise: Lifting & Shifting with Azure SQL Database/Managed Inst...PaaSport to Paradise: Lifting & Shifting with Azure SQL Database/Managed Inst...
PaaSport to Paradise: Lifting & Shifting with Azure SQL Database/Managed Inst...Sandy Winarko
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 

La actualidad más candente (20)

Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Is there a way that we can build our Azure Synapse Pipelines all with paramet...Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Is there a way that we can build our Azure Synapse Pipelines all with paramet...
 
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 Questions
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You Think
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2
 
Moving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed InstanceMoving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed Instance
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
 
Caching with DynamoDB and DAX - DevDay Austin 2017 Day 2
Caching with DynamoDB and DAX - DevDay Austin 2017 Day 2Caching with DynamoDB and DAX - DevDay Austin 2017 Day 2
Caching with DynamoDB and DAX - DevDay Austin 2017 Day 2
 
Accessing Google Cloud APIs
Accessing Google Cloud APIsAccessing Google Cloud APIs
Accessing Google Cloud APIs
 
NoSQL, which way to go?
NoSQL, which way to go?NoSQL, which way to go?
NoSQL, which way to go?
 
Data Design for Microservices - DevDay Austin 2017 Day 2
Data Design for Microservices - DevDay Austin 2017 Day 2Data Design for Microservices - DevDay Austin 2017 Day 2
Data Design for Microservices - DevDay Austin 2017 Day 2
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
 
Bi and AI updates in the Microsoft Data Platform stack
Bi and AI updates in the Microsoft Data Platform stackBi and AI updates in the Microsoft Data Platform stack
Bi and AI updates in the Microsoft Data Platform stack
 
Discovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersDiscovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clusters
 
Microsoft SQL server 2017 Level 300 technical deck
Microsoft SQL server 2017 Level 300 technical deckMicrosoft SQL server 2017 Level 300 technical deck
Microsoft SQL server 2017 Level 300 technical deck
 
Discovery Day 2019 Sofia - What is new in SQL Server 2019
Discovery Day 2019 Sofia - What is new in SQL Server 2019Discovery Day 2019 Sofia - What is new in SQL Server 2019
Discovery Day 2019 Sofia - What is new in SQL Server 2019
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis Analytics
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
 
PaaSport to Paradise: Lifting & Shifting with Azure SQL Database/Managed Inst...
PaaSport to Paradise: Lifting & Shifting with Azure SQL Database/Managed Inst...PaaSport to Paradise: Lifting & Shifting with Azure SQL Database/Managed Inst...
PaaSport to Paradise: Lifting & Shifting with Azure SQL Database/Managed Inst...
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 

Similar a Serverless SQL

IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveTorsten Steinbach
 
Coud-based Data Lake for Analytics and AI
Coud-based Data Lake for Analytics and AICoud-based Data Lake for Analytics and AI
Coud-based Data Lake for Analytics and AITorsten Steinbach
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeTorsten Steinbach
 
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudIBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudTorsten Steinbach
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeTorsten Steinbach
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AITorsten Steinbach
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWSAmazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekMark Kromer
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Crate.io
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Amazon Web Services
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...Cisco DevNet
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAmazon Web Services
 

Similar a Serverless SQL (20)

IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
 
Coud-based Data Lake for Analytics and AI
Coud-based Data Lake for Analytics and AICoud-based Data Lake for Analytics and AI
Coud-based Data Lake for Analytics and AI
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudIBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data Lake
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AI
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data Week
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
The Best of re:invent 2016
The Best of re:invent 2016The Best of re:invent 2016
The Best of re:invent 2016
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 

Más de Torsten Steinbach

Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeTorsten Steinbach
 
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services Torsten Steinbach
 
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?Torsten Steinbach
 
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM CloudIBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM CloudTorsten Steinbach
 
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL Torsten Steinbach
 
IBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionIBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionTorsten Steinbach
 
IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud
IBM Insight 2014 - Advanced Warehouse Analytics in the CloudIBM Insight 2014 - Advanced Warehouse Analytics in the Cloud
IBM Insight 2014 - Advanced Warehouse Analytics in the CloudTorsten Steinbach
 
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudIBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudTorsten Steinbach
 
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter AnalysisIBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter AnalysisTorsten Steinbach
 
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...Torsten Steinbach
 
IBM Information on Demand 2013 - Session 2839 - Using IBM PureData System fo...
IBM Information on Demand 2013  - Session 2839 - Using IBM PureData System fo...IBM Information on Demand 2013  - Session 2839 - Using IBM PureData System fo...
IBM Information on Demand 2013 - Session 2839 - Using IBM PureData System fo...Torsten Steinbach
 
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892Torsten Steinbach
 

Más de Torsten Steinbach (12)

Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data Lake
 
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
 
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
 
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM CloudIBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
 
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
 
IBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionIBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query Introduction
 
IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud
IBM Insight 2014 - Advanced Warehouse Analytics in the CloudIBM Insight 2014 - Advanced Warehouse Analytics in the Cloud
IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud
 
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudIBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
 
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter AnalysisIBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
 
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
 
IBM Information on Demand 2013 - Session 2839 - Using IBM PureData System fo...
IBM Information on Demand 2013  - Session 2839 - Using IBM PureData System fo...IBM Information on Demand 2013  - Session 2839 - Using IBM PureData System fo...
IBM Information on Demand 2013 - Session 2839 - Using IBM PureData System fo...
 
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
 

Último

The market for cross-border mortgages in Europe
The market for cross-border mortgages in EuropeThe market for cross-border mortgages in Europe
The market for cross-border mortgages in Europe321k
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
 
How to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentHow to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentAggregage
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-ProfitsTimothy Spann
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Báo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingBáo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingMarketingTrips
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media PlatformsMahmoud Yasser
 
Data Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potxData Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potxEmmanuel Dauda
 
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...ferisulianta.com
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptxFurkanTasci3
 
Unleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMUnleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMMarco Wobben
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptxFurkanTasci3
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1bengalurutug
 
Paul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdfPaul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdfdcphostmaster
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Air Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfAir Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfJasonBoboKyaw
 
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Neo4j
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseThinkInnovation
 

Último (20)

The market for cross-border mortgages in Europe
The market for cross-border mortgages in EuropeThe market for cross-border mortgages in Europe
The market for cross-border mortgages in Europe
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
How to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentHow to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product Development
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Báo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingBáo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân Marketing
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media Platforms
 
Data Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potxData Analytics Fundamentals: data analytics types.potx
Data Analytics Fundamentals: data analytics types.potx
 
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
 
Unleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMUnleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IM
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
 
Target_Company_Data_breach_2013_110million
Target_Company_Data_breach_2013_110millionTarget_Company_Data_breach_2013_110million
Target_Company_Data_breach_2013_110million
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1
 
Paul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdfPaul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Air Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfAir Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdf
 
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data Warehouse
 

Serverless SQL

  • 2. SQL on Object Storage DM Gartner Hype Cycle 2018
  • 3. Evolution of Form Factors For Big Data Analytics Enterprise Data Warehouses Tightly integrated and optimized systems Hadoop Introduced open data formats & easy scaling on commodity HW Cloud-Native: Serverless Analytics-aaS • Seamless elasticity • Pay-per-query consumption • Analyze data as it sits in an object store • Disaggregated architecture • No more infrastructure head aches The 90-ies 2000 Today
  • 4. Ingredient 3: Serverless Data Transformation Ingredient 4: Serverless Analytics Ingredient 5: Serverless Automation Ingredient 2: Serverless Data Ingest Sharing Economy for Analytics Ingredient 1: Serverless Storage
  • 5. Object Storage IBM Cloud Object Storage Objects Objects Objects At Rest On the Wire Buckets Encrypted Pennies per GB REST Elastic Durable Flexible Resiliency Choices Storage Classes User Managed Encryption Keys S3 Compatible High Speed Data Transfer Aspera SQL Queries
  • 6. Data Ingest Options 6 High Customizability Degree of Serverless-ness IBM Event Streams (Kafka aaS) IBM Cloud Functions Out-of-the-Box IBM Streaming Analytics (IBM Streams aaS) via Cloud Object Storage API SQL Query ETL Cloudant Replication Blockchain Synch
  • 7. Cloud Data Data Transformation Serverless SQL Analytics IBM SQL Query Object Storage Db2 + Developers Data Engineers Data Analysts ü Perfect for Machine Generated Data ü Ad-hoc Data Exploration ü Operationalizing Data Pipelines ü Big Data Lakes ü Flexible Data Transformation ü Extremely affordable. 5$/TB scanned ü 100% API enabled ü Analytics on Object Storage ü Big Data Scale-Out. Running on Spark ü 100% Self service – No Setup
  • 8. 2. Read data 4. Read results Application 3. Write results IBM Cloud Object Storage Result SetData Set Data Set Data Set 1. Submit SQL SQL Archive / Export IBM Cloud Streaming IBM Streams Event Streams Land Query IBM Cloud Functions IBM SQL Query Architecture IBM Cloud Databases Db2 on Cloud Geospatial SQLData Skipping Timeseries SQL Upload
  • 9. Data Center 2 Analytics Engine Cluster 20 Kernels Node 1 Node 3 Node 2 Node 3 … 20 Kernels … Data Center 3 Analytics Engine Cluster 20 Kernels Node 1 Node 3 Node 2 Node 3 … 20 Kernels … SQL 1 SQL 1 Data Center 1 IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018) Analytics Engine Cluster 20 Kernels Cluster Pool Request Queue Node 1 Node 3 Node 2 Node 3 … Kernel Pools 20 Kernels … SQL 1 SQL 2 SQL 3 SQL 4 SQL 5 Cloud Object Storage SQL 6 … JKG (Web Sockets) IBM Cloud Query – Spark Cluster Architecture
  • 10. SQL REST API Create Query SQL Web Console Watson Studio Notebooks SQL Cloud Function Integrate Explore Deploy IBM Cloud Query – Access Patterns Node SDK Python SDK JDBCLooker
  • 11. Best of breed Spark SQL Reference • Complete, intuitive and interactive SQL Reference • Each sample SQL can immediately be executed as is https://cloud.ibm.com/docs/services/sql-query/sqlref/sql_reference.html#sql-reference Analytics using full Power of Spark SQL
  • 12. IBM SQL Query – Timeseries SQL 1/2 § Intuitive first-of-a-kind SQL extensions for timeseries operations § Industry leading differentiators, including: • Timeseries transformation functions: • Correlation, Fourier transformation, z-normalization, Granger, interpolation, and distances • Temporal Joins: SQL support for Left/Right/Full Inner and Outer joins of multiple timeseries Alignment & Joining:
  • 13. § Further Industry leading differentiators • Numerical and categorical timeseries types • Timeseries data skipping for fast queries • Forecasting: • ARIMA, BATS, Anomaly detection, etc. • Subsequence Mining: • Train & match models for event sequences • Segmentation: • Time-based, Record-based, Anchor-based, Burst, and silence Segmentation: IBM SQL Query – Timeseries SQL 2/2
  • 14. • IBM SQL Query – Spatial SQL § SQL/MM standard to store & analyze spatial data in RDBMS § Migration of PostGIS compliant SQL queries § Aggregation, computation and join via native SQL syntax § Industry leading differentiators • Geodetic Full Earth support • Increased developer productivity • Avoid piece-wise planar projections • High precision calculations anywhere on the earth • Very large polygons (e.g. countries), polar caps, x-ing anti-meridian • Spatial data skipping for fast queries • Native and fine-granular geohash support • Fast spatial aggregation
  • 15. Example: Spatio-Temporal Processing of Sensor Data IBM Cloud Object Storage Sensor Data Query Location Analytics Mobile Cars Devices Land Location Filtering Spatial Aggregation GPS SQL/MM Sensor Metrics t t t Timeseries Assembly Timeseries Join Timeseries SQL t
  • 17. Unstructured Data Prep SQL Query Cloud Functions Analyze COSCOS Extract Features Automated/Scheduled SQL Execution SQL Query Cloud Functions Develop SQL Deploy as SQL Cloud Function Set up Cloud Function Trigger/Schedule Shield Data From Direct Access SQL Query Cloud Functions Deploy Cloud Function with COS API Key User Calls Function to Access Data COS Grant Execute on SQL Cloud Function to User Configure SQL Pipelines SQL Query Cloud Functions User creates function sequence to automate flow of consecutive SQLs Sequence SQL Query Cloud Functions 1. 2. Use Cases of Cloud Functions Adding Value to SQL
  • 18. Ingredient 3: Serverless Data Transformation ✓ Ingredient 4: Serverless Analytics ✓ Ingredient 5: Serverless Automation ✓ Ingredient 2: Serverless Data Ingest ✓ Ingredient 1: Serverless Storage ✓ Now, what is this all good for?
  • 19. IBM Cloud Object Storage Acquire Query Data Warehouses & Databases Db2 on Cloud Process Analyze ApplicationsApplications Applications IoT Streaming Devices Devices Devices BI & AI Land Log Messages Cleanse Filter Merge Aggregate Compress Watson Studio Looker Cognos WML Explore Analyze Analyze Promote Use for Data Pipelines to fuel BI & AI
  • 20. Data –Driven Decisions ☛ Understanding system health, user behavior & workload status Collecting & Analyzing Log Data ☛ Is NOT and afterthought but rather foundation for decisions on system and feature design. Data Volume Growing Rapidly ☛ Growth rates and data volume at rest can jump dramatically. Very high elasticity is required. Competitive Advantage ☛ Is based on short runways for turning data into actions Turn your Logs into Business – Log Data Is The Cloud-Native Currency
  • 21. Logs Your Cloud Application/Solution IBM Cloud Object Storage Query Transform Compress Aggregate Repartition Analyze Anomaly Detection User Segmentation Customer Support Resource Planning • Build & run data pipelines and analytics of your log message data • Flexible log data analytics with full power of SQL • Seamless scalability & elasticity according to your log message volume Use for analyzing application logs
  • 22. IDUG Db2 Tech Conference Charlotte, NC | June 2 – 6, 2019 Data Lake in IBM Cloud – How it works IBM Cloud Data LakeData Streaming Upload ETL DB2 Feature Extraction Data Prep ICD DB2 ICD OLAP Analytics WML ETL Federate Asper a Cloudant Replication Secure Sync IBM Blockchain Application s Application s Watson Studio Knowledge Catalog METASTORE AI ICP for DataAnalytics Engine IBM Cloud Functions Land Process Integrate Key Protect Index Creation
  • 23. Getting started: https://www.ibm.com/cloud/sql-query SQL Query Intro Video: https://youtu.be/s-FznfHJpoU SQL Query Starter Notebook in Watson Studio: https://ibm.biz/BdYNrN SQL Reference: https://ibm.biz/Bd2jF7 SQL Query API doc: https://cloud.ibm.com/apidocs/sql-query Big Data Layout Best Practices for COS: https://ibm.biz/Bd2jRg Serverless Data & Analytics: https://ibm.biz/Bd2jF5 Further Resources
  • 25. IDUG Db2 Tech Conference Charlotte, NC | June 2 – 6, 2019 1. Identify friction points in users’ digital journey, e.g.: • Clicks-2-purchase ratio • Unexpected repeated page visits per user • E.g. entering payment data should only happen once • Last page visited per session 2. Identify click sequences for successful purchase • Sequence matching using timeseries analysis 3. Identify customers/segments likely to churn or expand • Look for typical page visits, actions or flows • E.g. Terms & conditions, invite additional users etc. 4. Determining your most important content online What Insights can I extract from a Clickstream?
  • 26. 1. Identify friction points in users’ digital journey, e.g.: • Clicks-2-purchase ratio • Unexpected repeated page visits per user • E.g. entering payment data should only happen once • Last page visited per session 2. Identify click sequences for successful purchase • Sequence matching using timeseries analysis 3. Identify customers/segments likely to churn or expand • Look for typical page visits, actions or flows • E.g. Terms & conditions, invite additional users etc. 4. Determining your most important content online What Insights can I extract from a Clickstream?
  • 27. Building IBM Cloud-Native Data Lake Serverless SQL Serverless Storage Serverless Pipeline Automation ✓ ✓ ✓ Orchestration Processing Persistency Data Ingest ✓ Data Catalog ✓ Serverless Unstructured Data Processing ✓
  • 28. • Traditional analytics systems • Fixed capacities of appliances • Specialized teams of data engineers & DBAs who manage data model, access and ETL • BI analysts who have access only to the curated data sets in EDW • Innovative enterprises today • Wide range of teams that require direct access to same data set at all stages of the data pipeline: BI analysts, data scientists, quantitative marketers, dev/ops, developers • Data engineers that support these teams need a much, much more scalable and cost- effective platform to ensure all teams have access they need and when needed • Building analytics platforms in the cloud because of the scale and cost-efficiencies that come with serverless analytics over object stores Serverless – The key to IT Sharing Economy ... also for Analytics
  • 29. Proper data organization è better performance and lower cost 29 , 2 0 1 9 / © 2 0 1 9 I B M C o r p o r a t i o n The key factors are: • Number of bytes shipped • Number of REST requests Best practices for structured data: • Choose the right object size (sweet spot: 128 MB) • Choose the right format • Choose the right data layout • Avoid gzip compressed formats Applies to SQL Query but also applies to other Big Data engines To learn more: https://www.ibm.com/blogs/bluemix/2018/06/big-data-layout/
  • 30. Which Format is Query-Friendly?
  • 31. 2. Use Hive style partitioning GPMeterStream/dt=2017-08-17/part-00085.csv GPMeterStream/dt=2017-08-17/part-00086.csv GPMeterStream/dt=2017-08-17/part-00087.csv GPMeterStream/dt=2017-08-17/part-00088.csv GPMeterStream/dt=2017-08-17/part-00089.csv GPMeterStream/dt=2017-08-18/part-00001.csv GPMeterStream/dt=2017-08-18/part-00002.csv GPMeterStream/dt=2017-08-18/part-00003.csv Avoid reading unnecessary objects altogether Technique has limitations Best Practice: minimize bytes scanned 1. Use Parquet • Column based • Only read the columns you need • Column wise compression • Min/max metadata
  • 32. Table Locators cos://<endpoint>/<bucket>/[<prefix>] <format definition> Endpoint – of your object storage bucket or a short alias E.g. s3.us-south.objectstorage.appdomain.cloud or alias us-south Bucket – name in object storage Prefix – one or multiple objects (i.e. table partitions) with same prefix Used in FROM clauses for input data and in target field for result set data Examples: cos://us-south/myBucket/myFolder/mySubFolder/myData.parquet cos://us/otherBucket/myData cos://us/otherBucket/myData/part cos://eu/newBucket/
  • 33. <Table Locator> [JOBPREFIX JOBID | NONE] [STORED AS CSV | PARQUET | JSON] • Specifies the data format of the input data • Table schema is automatically inferred at SQL execution time • STORED AS Clause is optional, the default is CSV • Additional parameters for CSV: • E.g.: FIELDS TERMINATEY BY ‘t’ NOHEADER • JOBPREFIX only for targets: defines unique prefix to append. Default is JOBID. Table Format Definition
  • 34. SELECT … INTO <Table Locator> [STORED AS CSV | PARQUET | JSON] [PARTITIONED [BY (<column list>)] [INTO <num> BUCKETS] [EVERY <num> ROWS]] [SORT BY (<column list>)] BY: Produces Hive Style Partitioning INTO: Produced fix number of partitions (hash partitioned) EVERY: Produces partitioned of even size (e.g. for pagination) SORT BY: Exact result order & clustering when combined with PARTITIONED Table Partitioning Definition
  • 35. Submit a SQL query POST https://api.sql-query.cloud.ibm.com/v2/sql_jobs Runs the SQL in the background and returns a job_id Detailed info for a SQL query (e.g. status, result location) GET https://api.sql-query.cloud.ibm.com /v2/sql_jobs/{job_id} Returns JSON with query execution details List of recent SQL query executions GET https://api.sql-query.cloud.ibm.com /v2/sql_jobs Returns JSON array with last 30 SQL submissions and outcomes IBM SQL Query REST API
  • 36. IDUG Db2 Tech Conference Charlotte, NC | June 2 – 6, 2019 Scaling Analytics: Data Skipping Saving you Time and $ Index All Objects IBM Cloud Object Storage Data Set Objects SQL Query Data Skipping Indexing Candidate Objects WHERE Clause Saving Time and $ SQL Query learns which objects are not relevant to a query using a data skipping index CREATE METAINDEX stores index summary metadata for each object. Much smaller than the data. SQLs skipping irrelevant objects to significantly reduce I/O E.g.: Independent of data formats Index Types: Min/Max, Value List, Bounding Box Get location and time of heat waves (>40 celcius) SELECT lat, long, city, temp, date FROM weather WHERE temp > 40.0
  • 37. Scaling Analytics: Data Skipping Saving you Time and $ Index All Objects IBM Cloud Object Storage Data Set Objects SQL Query Data Skipping Indexing Candidate Objects WHERE Clause Saving Time and $ SQL Query learns which objects are not relevant to a query using a data skipping index CREATE METAINDEX stores index summary metadata for each object. Much smaller than the data. SQLs skipping irrelevant objects to significantly reduce I/O E.g.: Independent of data formats Index Types: Min/Max, Value List, Bounding Box Get location and time of heat waves (>40 celcius) SELECT lat, long, city, temp, date FROM weather WHERE temp > 40.0
  • 38. • JDBC compliant driver library that wraps REST API • Wrapping both, SQL Query and COS REST API • Exposing regular session interface (JDBC Connection) • Enabling custom JDBC application support • Enabling BI application support • Early adopter: Looker • Support for stored table meta data (simple catalog) • Stored as json in COS and referenced via JDBC connection string • I.e. DatabaseMetaData interface also supported JDBC Driver for BI Applications Apply for Beta Now Query JDBC Driver REST COS JDBC API DataResult Sets Table Catalog E.g. Looker
  • 39. Using SQL Query JDBC Driver Define table catalog • JSON file in COS containing: • Table name • Location of table objects on COS • Object format • Column names • Column types • INT, FLOAT, VARCHAR, TIMESTAMP JDBC Connection String: jdbc:SQLQuery:<sql-query instance crn> ?schemabucket=<COS bucket with json catalog> ?schemafile=<COS object with json catalog> &apikey=<api key for your account> &targetcosurl=<COS URL for result set>
  • 40. Think 2019 / 2263 / February 2019 / © 2019 IBM Corporation
  • 41. IBM Cloud Functions Fair Never pay for idle Polyglot Elastic Automation Triggers Open Source CLOUD FUNCTIONS Schedules Sequences