SlideShare una empresa de Scribd logo
1 de 53
Descargar para leer sin conexión
Serverless Data Platform
Scott (考特)
July 22nd, 2021 (Thur)
AWS User Group
Scott (考特)
Shu-Jeng, Hsieh
● Sr. Data Engineer, the 104
● AWS Community Builder
Agenda
Prologue
What is a lakehouse?
The lakehouse built on AWS
Summary
Prologue
104 Corporation Region
Production
Research Environment
Production
Research Environment
Analysts Scientists
Amazon EMR
Analysts
Developers
PMs
Managers
● a SAP tool to ETL
● SPSS for insight &
stats modeling
Analysts
● license fee
● outdated front-end
technology
Scientists
● got power from the
AWS computing
● explore potential
ML applications
● unavailability of data
○ external
○ internal
● data difference
Data Engineers
● open-sourced
tools to ETL
● productionize
scientists’
invention
● maintain & evolve
existing services
● explore potential
ML applications
Other BUs
● Make wishes
● repetitive tasks on
data expansion
● unavailability of
data
What is a lakehouse?
Data Lake
Unstructured
Semi-structured
structured
Machine
Learning
Data
Science
Data Warehouse
Data Warehouse
Data Warehouse
BI
BI
BI
External Data Operational Data
BI Reports
Data Warehouse
ETL
Data Warehouses
- Built for BI and reporting
- No support for video, audio, text
- No support for data science, ML
- Limited support for streaming
- Closed & proprietary formats
structured, semi structured, and unstructured data
BI Reports
Data Warehouse
ETL
Data Lakes
- Poor BI support
- Complex to set up
- Poor performance
- Unreliable data swamps
Data Prep and
Validation
Real-time
Database
Data Lake
Machine
Learning
Data
Science
structured, semi structured, and unstructured data
BI
Lakehouse
Machine
Learning
Data
Science
Streaming
Analytics
One platform for every use case
Data Lake for all your data
The lakehouse built on
AWS
CloudFormation CDK
● With elapsed time,
YAML/JSON became larger
● Difficult to work with large
YAML/JSON files
○ High error ratio when
copying/pasting
○ It’s a text file, not
programming language
● Infrastructure AS code
● No abstraction
● IDE integration
○ Multiple languages
○ Syntax check,
autocompletion, etc.
● Higher level abstraction
○ Simplified statements
○ 500 lines of CFN to 30
lines of CDK code
● Infrastructure IS code
Why S3
S3 Standard S3 INT S3 S-IA S3 O-IA S3 Glacier
Frequent Infrequent
Access Frequency
● Efficient, columnar data representation.
● Utilizes the record shredding and assembly algorithm.
● Supports schema evolution.
Dataset
Size on
Amazon S3
Query
Run
Time
Data
Scanned
Cost
Data stored as CSV files 1 TB
236
seconds
1.15 TB $5.75
Data stored in
Apache Parquet Format
130 GB
6.78
seconds
2.51 GB $0.01
Savings
87% less when
using Parquet
34x
faster
99% less data
scanned
99.7% savings
The lakehouse built on
AWS
CDC pipeline
Debezium Architecture
Database Migration Service
Source Database Target Database
AWS DMS
Source
Endpoint
Target
Endpoint
Replication Instance
Replication
Task
IBM® Db2®
● CDC listening supported for some dbs
● C-language-based
● Configuration-oriented
● Abundant supported sources and targets
AWS DMS
The lakehouse built on
AWS
Big data tools
- Spark APIs
1. RDD
2. DataFrame
3. Dataset
- Various modes
1. Local
2. Standalone
3. Mesos
4. k8s
- Supported interfaces
1. Java/Scala
2. Python
3. R
4. SQL
- Unified interfaces for
batch and streaming
Worker
JVM
Executor
Worker
JVM
Executor
Worker
JVM
Executor
Driver
Apache Hudi
Apache Iceberg
Delta Lake
Delta Lake (open source) Apache Iceberg Apache Hudi
Transaction (ACID) Y Y Y
MVCC Y Y Y
Time travel Y Y Y
Schema Evolution Y Y Y
Data Mutation Y (update/delete/merge/ merge into) N Y (upsert)
Streaming Sink and source for Spark struct streaming
Sink and source (wip) for Spark
struct streaming, Flink (wip)
DeltaStreamer
HiveincrementalPuller
File Format Parquet Parquet, ORC, AVRO Parquet
Compaction/Cleanup Manual API available Manual and Auto
Integration DSv1, Delta connector DSv2, InputFormat DSv1, InputFormat
Multiple language support Scala/Java/Python Java/Python Java/Python
Storage Abstraction Y Y N
API dependency Spark-bundled Native/Engine bundled DeltaStreamer
Data ingestion Spark, Presto, Hive Spark, Hive DeltaStreamer
The lakehouse built on
AWS
Databricks
ACID transactions on Spark
Delta Lake
Scalable metadata handling
Streaming and batch unification
Schema enforcement
Time travel
Upserts and deletes
https://github.com/databrickslabs/terraform-provider-databricks
Databricks
AWS account
User
AWS account Data plane
in user account
Control plane network in Databricks account
Workspace web application, APIs, and other core services
AWS VPC
endpoint service
AWS VPC
endpoint service
Back-end VPC endpoint
for secure cluster connectivity
relay
Back-end VPC endpoint
for
REST APIs
AWS PrivateLink connection (back-end) AWS PrivateLink connection (back-end)
User on-premise
or VPN network
User transit
AWS account Front-end VPC endpoint
For user access to web
App and REST APIs
User request
to web app
or REST APIs
VPC
AWS PrivateLink
Connection (front-end)
Languages
Scala Rust Python ruby Golang
**
**
Services
Connectors
Databases
Databricks
dafka-
delta-
ingest
Airbyte *
* Currently in development ** coming soon
The lakehouse built on
AWS
Data Platform
CDK Application
Construct
utility components
Construct
S3-related
Construct
Databricks
Construct
DMS instances
Construct
DMS endpoints
Construct
DMS tasks
High-level
Architecture
The lakehouse built on
AWS
During the build time
Glue ETL
1. Serverless Spark
2. DynamicFrame
3. Self-describing, no schema required initially
4. Some feature functions
a. ResolveChoice
b. Unbox => similar to from_json
c. Spigot => similar to TABLESAMPLE
Glue crawlers
Native client JDBC MongoDB client
Glue workflow
Mohit, S., 2020, Simplify data pipelines with AWS Glue automatic code generation and Workflows
Developer
Endpoint
Glue scripts
https://docs.aws.amazon.com/glue/latest/dg/dev-endpoint-workflow.html
New log
data arrives
Start
Glue crawler
Needed stats
by requirements
Glue crawler
that deals with
metadata
Glue job
that executes
the ETL
Start
Glue job
CloudWatch
event
Before Jun 20, 2019….
Glue workflow
New log
data arrives
Start
Glue workflow
Needed stats
by requirements
Glue job
that executes
the ETL
Glue crawler
that deals with
metadata
CDK Application
Construct
S3 Event
Notification
Construct
S3 Event
Notification
Version
Supported Spark
and Python
Minimum DPU Default DPU
AW Glue 0.9
● Spark 2.2.1
● Python 2.7
2 10
AWS Glue 1.0
● Spark 2.4.3
● Python 2.7
● Python 3.6
2 10
AWS Glue 2.0
● Spark 2.4.3
● Python 3.7
2 10
AWS Glue Delta Lake
Spark version Spark 2.2.1 & 2.4.3 Always the latest
UI experience
More clicks to job
logs
On hover
Reliability & quality - ACID transactions
● Dynamic Partition Pruning
● Adaptive Query Execution
● accelerator -aware scheduler
● Unified table creation SQL syntax
● Shuffled hash join improvement
Summary
Amazon
S3
AWS
Snowball
AWS
Snowmobile
Amazon Kinesis
Video Streams
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Streams
Amazon
Kinesis
Amazon
Redshift
Amazon
EMR Amazon
Athena
Amazon
Elasticsearch
Service
AI services
● Any type of data
● Security, compliance, and
audit capabilities across
data lake
● Empower all personas
● Democratize ML with SQL
● Unified analytics
Open Data Lake (S3)
Data Management & Governance
Data Science
& Machine Learning
Real-time Data
Applications
BI & SQL
Analytics
Data Engineering
Structured Semi-structured Unstructured Streaming
Lakehouse Platform
Cheng-Wei
Bill Chih-Yi
Daniel Luffy Neil
TItan
Nick Scott
Ask
Me
Anything

Más contenido relacionado

La actualidad más candente

Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Sumeet Singh
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreModern Data Stack France
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015Yousun Jeong
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesNicola Ferraro
 
Cosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopCosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopDatabricks
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangDatabricks
 
The Hidden Life of Spark Jobs
The Hidden Life of Spark JobsThe Hidden Life of Spark Jobs
The Hidden Life of Spark JobsDataWorks Summit
 
Data processing platforms with SMACK: Spark and Mesos internals
Data processing platforms with SMACK:  Spark and Mesos internalsData processing platforms with SMACK:  Spark and Mesos internals
Data processing platforms with SMACK: Spark and Mesos internalsAnton Kirillov
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Uri Laserson
 
Lambda Architecture using Google Cloud plus Apps
Lambda Architecture using Google Cloud plus AppsLambda Architecture using Google Cloud plus Apps
Lambda Architecture using Google Cloud plus AppsSimon Su
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?DataWorks Summit
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
 

La actualidad más candente (20)

Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
 
Cosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopCosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics Workshop
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 
Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on HadoopEnterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on Hadoop
 
Transactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric LiangTransactional writes to cloud storage with Eric Liang
Transactional writes to cloud storage with Eric Liang
 
The Hidden Life of Spark Jobs
The Hidden Life of Spark JobsThe Hidden Life of Spark Jobs
The Hidden Life of Spark Jobs
 
Data processing platforms with SMACK: Spark and Mesos internals
Data processing platforms with SMACK:  Spark and Mesos internalsData processing platforms with SMACK:  Spark and Mesos internals
Data processing platforms with SMACK: Spark and Mesos internals
 
Spark Working Environment in Windows OS
Spark Working Environment in Windows OSSpark Working Environment in Windows OS
Spark Working Environment in Windows OS
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)
 
Lambda Architecture using Google Cloud plus Apps
Lambda Architecture using Google Cloud plus AppsLambda Architecture using Google Cloud plus Apps
Lambda Architecture using Google Cloud plus Apps
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 

Similar a Serverless Data Platform

Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Amazon Web Services
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileRoy Kim
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveTorsten Steinbach
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaHelen Rogers
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and dockerBob Ward
 
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowDatabricks
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Con LA
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeTorsten Steinbach
 
Solved: Your Most Dreaded Test Environment Management Challenges
Solved: Your Most Dreaded Test Environment Management ChallengesSolved: Your Most Dreaded Test Environment Management Challenges
Solved: Your Most Dreaded Test Environment Management ChallengesDevOps.com
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AITorsten Steinbach
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Wes McKinney
 

Similar a Serverless Data Platform (20)

Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI Mobile
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon Elisha
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
 
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
DW on AWS
DW on AWSDW on AWS
DW on AWS
 
Best of re:Invent
Best of re:InventBest of re:Invent
Best of re:Invent
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
The Best of re:invent 2016
The Best of re:invent 2016The Best of re:invent 2016
The Best of re:invent 2016
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data Lake
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Solved: Your Most Dreaded Test Environment Management Challenges
Solved: Your Most Dreaded Test Environment Management ChallengesSolved: Your Most Dreaded Test Environment Management Challenges
Solved: Your Most Dreaded Test Environment Management Challenges
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AI
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
 

Más de Shu-Jeng Hsieh

A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKShu-Jeng Hsieh
 
The Exabyte Journey and DataBrew with CICD
The Exabyte Journey and DataBrew with CICDThe Exabyte Journey and DataBrew with CICD
The Exabyte Journey and DataBrew with CICDShu-Jeng Hsieh
 
Serverless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipelineServerless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipelineShu-Jeng Hsieh
 
How cdk and projen benefit to A team
How cdk and projen benefit to A teamHow cdk and projen benefit to A team
How cdk and projen benefit to A teamShu-Jeng Hsieh
 
Optimization technique for SAS
Optimization technique for SASOptimization technique for SAS
Optimization technique for SASShu-Jeng Hsieh
 
Trial plan with capitation payment of the national healthcare insurance in ta...
Trial plan with capitation payment of the national healthcare insurance in ta...Trial plan with capitation payment of the national healthcare insurance in ta...
Trial plan with capitation payment of the national healthcare insurance in ta...Shu-Jeng Hsieh
 
Prediction for consuming behavior of noncontractual customers
Prediction for consuming behavior of noncontractual customersPrediction for consuming behavior of noncontractual customers
Prediction for consuming behavior of noncontractual customersShu-Jeng Hsieh
 
Scheduling advertisements on a web page to maximize revenue
Scheduling advertisements on a web page to maximize revenueScheduling advertisements on a web page to maximize revenue
Scheduling advertisements on a web page to maximize revenueShu-Jeng Hsieh
 
A data driven approach to measure web site navigability
A data driven approach to measure web site navigabilityA data driven approach to measure web site navigability
A data driven approach to measure web site navigabilityShu-Jeng Hsieh
 

Más de Shu-Jeng Hsieh (12)

Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
 
The Exabyte Journey and DataBrew with CICD
The Exabyte Journey and DataBrew with CICDThe Exabyte Journey and DataBrew with CICD
The Exabyte Journey and DataBrew with CICD
 
Serverless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipelineServerless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipeline
 
How cdk and projen benefit to A team
How cdk and projen benefit to A teamHow cdk and projen benefit to A team
How cdk and projen benefit to A team
 
Optimization technique for SAS
Optimization technique for SASOptimization technique for SAS
Optimization technique for SAS
 
The way
The wayThe way
The way
 
Demo of our thoughts
Demo of our thoughtsDemo of our thoughts
Demo of our thoughts
 
Trial plan with capitation payment of the national healthcare insurance in ta...
Trial plan with capitation payment of the national healthcare insurance in ta...Trial plan with capitation payment of the national healthcare insurance in ta...
Trial plan with capitation payment of the national healthcare insurance in ta...
 
Prediction for consuming behavior of noncontractual customers
Prediction for consuming behavior of noncontractual customersPrediction for consuming behavior of noncontractual customers
Prediction for consuming behavior of noncontractual customers
 
Scheduling advertisements on a web page to maximize revenue
Scheduling advertisements on a web page to maximize revenueScheduling advertisements on a web page to maximize revenue
Scheduling advertisements on a web page to maximize revenue
 
A data driven approach to measure web site navigability
A data driven approach to measure web site navigabilityA data driven approach to measure web site navigability
A data driven approach to measure web site navigability
 

Último

怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 

Último (20)

怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 

Serverless Data Platform