SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
Blink
Improvements to Flink &
Its Applications in Alibaba SearchXiaowei Jiang, Feng Wang
{xiaowei.jxw, jason.wang}
@alibaba-inc.com
Who Are We?
n Xiaowei Jiang
l 2014 −− now Alibaba
l 2010 −− 2014 Facebook
l 2002 −− 2010 Microsoft
l 2000 −− 2002 Stratify
n Feng Wang
l 2006 −− now Alibaba
About Alibaba
n  Alibaba Group
l  Operating the world’s largest online marketplace
l  Annual GMV $394 Billion in year 2015
n  Alibaba Search
l  Personalized search and recommendation platform
l  Major driver of online traffic
Agenda
n Background
n What is Blink?
n Improvements in Blink
n Challenges & Future
Logs
Scenario – Realtime A/B Test
Transacton
Parser
 Filter
 Join
 Agg
Parser
 Filter
UDF
 Druid
Click
Impression
 Parser
 Filter
Scenario – Search Index Build & Update
DataSource
Filter
Sync
HBase
IC
Filter
Sync
UIC
Join
Search
Engine
Export
HBase
Result
UIC
IC1
IC2
UIC1
UIC2
Streaming Topologies
Long Batch Pipelines
Machine Learning at Scale
Graph Analysis
à low latency
à resource utilization
à iterative algorithms
à mutable state
Flink: Unified Compute Engine
Flink Stack
What is Blink?
n Blink – Improvements to Flink from Alibaba
l Comprehensive Improvements to Flink Table API
l Improved Runtime Compatible with Flink API and Ecosystem
n Status
l Runs on Thousands of Nodes In Alibaba Production
l Supports Mission Critical Products
Table API Improvements
n Principle – Unified SQL layer for batch and streaming
n Functionality
l  UDF/UDTF/UDAGG
l  Stream-Stream Join
l  Aggregation(min, max, avg, sum, count, distinct_count)
l  Windowing (time_window, count_window)
l  Retraction
Runtime Improvements
n New Runtime Architecture on YARN
n Optimized State, Checkpoint & Failover
n Reliable & Production Quality
n Much More
Flink on YARN
Client Node YARN Node
YARN Node
YARN
ResourceManager
YARN
NodeManager
Container
Flink
JobManager
YARN
AppMaster
YARN Node
YARN
NodeManager
Container
Flink
TaskManager
YARN Node
YARN
NodeManager
Container
Flink
TaskManager
Flink
YARN Client
HDFS
4.allocate worker
3.allocate app master
1. store user jar and configuration
2. register resource and request app master
always bootstrap containers with user jar and config
Blink on YARN
Client Node YARN Node
YARN Node
YARN
ResourceManager
YARN
NodeManager
Container
JobMaster
YARN Node
YARN
NodeManager
YARN Node
YARN
NodeManager
Blink Client
HDFS
4.allocate worker
3.allocate app master
1. store user jar and configuration
2. register resource and request app master
always bootstrap containers with user jar and config
Container
TaskExecutor
Container
TaskExecutor
Container
TaskExecutor
Container
Container
TaskExecutor
JobMaster
4.allocate worker
Blink Job Architecture
Yarn Node
NodeManager
Yarn Node
NodeManager
Shuffle Service
Yarn Node
NodeManager
Shuffle Service
HDFS
ZooKeeper
controlchannel
controlchannel
state backup/recover
local data channel local data channel
state backup/recover
Container
Job Master
task scheduler
checkpoint
coordinator
Container
rocks db spilled file
Task Executor
taskin out
Container
rocks db spilled file
Task Executor
taskin out
Container
rocks db spilled file
Task Executor
taskin out
Container
rocks db spilled file
Task Executor
taskin out
completed checkpoint
schedule events
Network data channel
Blink Checkpoint & State
TaskExecutor
Local CPn Local CPn-1Incremental Backup
OnComplete
i1 i2 i3 Bn
in queue
o1 o2 Bn-
1
o3
out queue
2. hard link snapshot
Job Master
1. trigger
3.ack
clean up
4. complete
clean up
Task
operator
state
HDFS
reference
async
CPn
CPn-1
diff
State Files
1.sst 2.sst n.sst
Blink Rescale
Blink Failover
At Least Once
Source
Source
Source
Source
fail restart
restart
failover
Excactly Once
Source
Source
Source
Source
fail restart
failover
Sink
Sink
Sink
Sink
Blink Metrics
Job Vertex Number: [CPU, Memory] * Parallelism

In Queue
TPS 
Out Queue
Latency
Delay
CPU
 Memory
Task Metrics
Running Tasks
Challenges & Future
n Continued Optimization in Streaming
n Batch in Production
n Machine Learning in Production
n Larger Cluster Scale
n Contribute back to Flink community
Q & A
Thank You!
Xiaowei Jiang: xiaowei.jxw@alibaba-inc.com
Twitter: @xiaoweij
Feng Wang: jason.wang@alibaba-inc.com
Twitter: @ifengwang

Más contenido relacionado

La actualidad más candente

Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksAnyscale
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...Holden Karau
 
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/HudiVinoth Chandar
 
Building large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twillBuilding large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twillHenry Saputra
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid DataWorks Summit
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemCloudera, Inc.
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasyDataWorks Summit
 
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorSpeed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorDatabricks
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsDataWorks Summit
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHortonworks
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkDatabricks
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Alex Zeltov
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkChester Chen
 
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...Databricks
 

La actualidad más candente (20)

Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
 
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
 
Building large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twillBuilding large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twill
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorSpeed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS Accelerator
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and Analytics
 
Machine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFiMachine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFi
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With Spark
 
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
 

Destacado

Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Kostas Tzoumas
 
Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayAdam Gibson
 
Beyond JVM - YOW! Brisbane 2013
Beyond JVM - YOW! Brisbane 2013Beyond JVM - YOW! Brisbane 2013
Beyond JVM - YOW! Brisbane 2013Charles Nutter
 
Machine Learning Exposed!
Machine Learning Exposed!Machine Learning Exposed!
Machine Learning Exposed!javafxpert
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)jbellis
 
Strata lightening-talk
Strata lightening-talkStrata lightening-talk
Strata lightening-talkDanny Yuan
 
Musings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBaseMusings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBaseJesse Yates
 
MongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: BenchmarkingMongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: BenchmarkingOlga Lavrentieva
 
Search Analytics with Flume and HBase
Search Analytics with Flume and HBaseSearch Analytics with Flume and HBase
Search Analytics with Flume and HBaseSematext Group, Inc.
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemDanny Yuan
 
Apache HBase Application Archetypes
Apache HBase Application ArchetypesApache HBase Application Archetypes
Apache HBase Application ArchetypesCloudera, Inc.
 
Etsy desconstruction
Etsy desconstructionEtsy desconstruction
Etsy desconstructionYash Vardhan
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLCloudera, Inc.
 
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchFrom Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchSematext Group, Inc.
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudDataWorks Summit
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 

Destacado (20)

Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
 
Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the Bay
 
Beyond JVM - YOW! Brisbane 2013
Beyond JVM - YOW! Brisbane 2013Beyond JVM - YOW! Brisbane 2013
Beyond JVM - YOW! Brisbane 2013
 
Machine Learning Exposed!
Machine Learning Exposed!Machine Learning Exposed!
Machine Learning Exposed!
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
 
Strata lightening-talk
Strata lightening-talkStrata lightening-talk
Strata lightening-talk
 
Musings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBaseMusings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBase
 
MongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: BenchmarkingMongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: Benchmarking
 
Search Analytics with Flume and HBase
Search Analytics with Flume and HBaseSearch Analytics with Flume and HBase
Search Analytics with Flume and HBase
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing system
 
Apache HBase Application Archetypes
Apache HBase Application ArchetypesApache HBase Application Archetypes
Apache HBase Application Archetypes
 
Etsy desconstruction
Etsy desconstructionEtsy desconstruction
Etsy desconstruction
 
Docker Monitoring Webinar
Docker Monitoring  WebinarDocker Monitoring  Webinar
Docker Monitoring Webinar
 
Solr Anti Patterns
Solr Anti PatternsSolr Anti Patterns
Solr Anti Patterns
 
Tuning Solr for Logs
Tuning Solr for LogsTuning Solr for Logs
Tuning Solr for Logs
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchFrom Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 

Similar a Improvements to Flink & it's Applications in Alibaba Search

APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , KongAPIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kongapidays
 
2021 JCConf 使用Dapr簡化Java微服務應用開發
2021 JCConf 使用Dapr簡化Java微服務應用開發2021 JCConf 使用Dapr簡化Java微服務應用開發
2021 JCConf 使用Dapr簡化Java微服務應用開發Rich Lee
 
Machine Learning Platform in LINE Fukuoka
Machine Learning Platform in LINE FukuokaMachine Learning Platform in LINE Fukuoka
Machine Learning Platform in LINE FukuokaLINE Corporation
 
cuttingEdgepresentation0318
cuttingEdgepresentation0318cuttingEdgepresentation0318
cuttingEdgepresentation0318Hongbiao Chen
 
Graphs: Fabric of DevOps
Graphs: Fabric of DevOpsGraphs: Fabric of DevOps
Graphs: Fabric of DevOpsNeo4j
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdfJim Dowling
 
Nats and netlify
Nats and netlifyNats and netlify
Nats and netlifyRyan Neal
 
Framework Engineering Revisited
Framework Engineering RevisitedFramework Engineering Revisited
Framework Engineering RevisitedYoungSu Son
 
Build your MVP on AWS - AWS Startup Day Johannesburg.pdf
Build your MVP on AWS - AWS Startup Day Johannesburg.pdfBuild your MVP on AWS - AWS Startup Day Johannesburg.pdf
Build your MVP on AWS - AWS Startup Day Johannesburg.pdfAmazon Web Services
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
AppsFlyer - Changing the car engine while racing - Implementing a Live Transi...
AppsFlyer - Changing the car engine while racing - Implementing a Live Transi...AppsFlyer - Changing the car engine while racing - Implementing a Live Transi...
AppsFlyer - Changing the car engine while racing - Implementing a Live Transi...Neo4j
 
JEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java WorldJEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java WorldSerg Masyutin
 
Monitoring to the Nth tier: The state of distributed tracing in 2016
Monitoring to the Nth tier: The state of distributed tracing in 2016Monitoring to the Nth tier: The state of distributed tracing in 2016
Monitoring to the Nth tier: The state of distributed tracing in 2016AppNeta
 
Vitaliy Makogon: Migration to ivy. Angular component libraries with IVY support.
Vitaliy Makogon: Migration to ivy. Angular component libraries with IVY support.Vitaliy Makogon: Migration to ivy. Angular component libraries with IVY support.
Vitaliy Makogon: Migration to ivy. Angular component libraries with IVY support.VitaliyMakogon
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story Roman Chukh
 
Raising ux bar with offline first design
Raising ux bar with offline first designRaising ux bar with offline first design
Raising ux bar with offline first designKyrylo Reznykov
 
java web framework standard.20180412
java web framework standard.20180412java web framework standard.20180412
java web framework standard.20180412FirmansyahIrma1
 

Similar a Improvements to Flink & it's Applications in Alibaba Search (20)

APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , KongAPIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
APIdays Paris 2019 - Adopting Service Mesh by Marco Palladino , Kong
 
2021 JCConf 使用Dapr簡化Java微服務應用開發
2021 JCConf 使用Dapr簡化Java微服務應用開發2021 JCConf 使用Dapr簡化Java微服務應用開發
2021 JCConf 使用Dapr簡化Java微服務應用開發
 
Machine Learning Platform in LINE Fukuoka
Machine Learning Platform in LINE FukuokaMachine Learning Platform in LINE Fukuoka
Machine Learning Platform in LINE Fukuoka
 
cuttingEdgepresentation0318
cuttingEdgepresentation0318cuttingEdgepresentation0318
cuttingEdgepresentation0318
 
Graphs: Fabric of DevOps
Graphs: Fabric of DevOpsGraphs: Fabric of DevOps
Graphs: Fabric of DevOps
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
 
Sai_Resume
Sai_ResumeSai_Resume
Sai_Resume
 
Nats and netlify
Nats and netlifyNats and netlify
Nats and netlify
 
Framework Engineering Revisited
Framework Engineering RevisitedFramework Engineering Revisited
Framework Engineering Revisited
 
Build your MVP on AWS - AWS Startup Day Johannesburg.pdf
Build your MVP on AWS - AWS Startup Day Johannesburg.pdfBuild your MVP on AWS - AWS Startup Day Johannesburg.pdf
Build your MVP on AWS - AWS Startup Day Johannesburg.pdf
 
Srinu_JavaDEveloper@2.5 years
Srinu_JavaDEveloper@2.5 yearsSrinu_JavaDEveloper@2.5 years
Srinu_JavaDEveloper@2.5 years
 
Java FX Part2
Java FX Part2Java FX Part2
Java FX Part2
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
AppsFlyer - Changing the car engine while racing - Implementing a Live Transi...
AppsFlyer - Changing the car engine while racing - Implementing a Live Transi...AppsFlyer - Changing the car engine while racing - Implementing a Live Transi...
AppsFlyer - Changing the car engine while racing - Implementing a Live Transi...
 
JEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java WorldJEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java World
 
Monitoring to the Nth tier: The state of distributed tracing in 2016
Monitoring to the Nth tier: The state of distributed tracing in 2016Monitoring to the Nth tier: The state of distributed tracing in 2016
Monitoring to the Nth tier: The state of distributed tracing in 2016
 
Vitaliy Makogon: Migration to ivy. Angular component libraries with IVY support.
Vitaliy Makogon: Migration to ivy. Angular component libraries with IVY support.Vitaliy Makogon: Migration to ivy. Angular component libraries with IVY support.
Vitaliy Makogon: Migration to ivy. Angular component libraries with IVY support.
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
 
Raising ux bar with offline first design
Raising ux bar with offline first designRaising ux bar with offline first design
Raising ux bar with offline first design
 
java web framework standard.20180412
java web framework standard.20180412java web framework standard.20180412
java web framework standard.20180412
 

Más de DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Más de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Último

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Último (20)

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Improvements to Flink & it's Applications in Alibaba Search

  • 1. Blink Improvements to Flink & Its Applications in Alibaba SearchXiaowei Jiang, Feng Wang {xiaowei.jxw, jason.wang} @alibaba-inc.com
  • 2. Who Are We? n Xiaowei Jiang l 2014 −− now Alibaba l 2010 −− 2014 Facebook l 2002 −− 2010 Microsoft l 2000 −− 2002 Stratify n Feng Wang l 2006 −− now Alibaba
  • 3. About Alibaba n  Alibaba Group l  Operating the world’s largest online marketplace l  Annual GMV $394 Billion in year 2015 n  Alibaba Search l  Personalized search and recommendation platform l  Major driver of online traffic
  • 5. Logs Scenario – Realtime A/B Test Transacton Parser Filter Join Agg Parser Filter UDF Druid Click Impression Parser Filter
  • 6. Scenario – Search Index Build & Update DataSource Filter Sync HBase IC Filter Sync UIC Join Search Engine Export HBase Result UIC IC1 IC2 UIC1 UIC2
  • 7. Streaming Topologies Long Batch Pipelines Machine Learning at Scale Graph Analysis à low latency à resource utilization à iterative algorithms à mutable state Flink: Unified Compute Engine
  • 9. What is Blink? n Blink – Improvements to Flink from Alibaba l Comprehensive Improvements to Flink Table API l Improved Runtime Compatible with Flink API and Ecosystem n Status l Runs on Thousands of Nodes In Alibaba Production l Supports Mission Critical Products
  • 10. Table API Improvements n Principle – Unified SQL layer for batch and streaming n Functionality l  UDF/UDTF/UDAGG l  Stream-Stream Join l  Aggregation(min, max, avg, sum, count, distinct_count) l  Windowing (time_window, count_window) l  Retraction
  • 11. Runtime Improvements n New Runtime Architecture on YARN n Optimized State, Checkpoint & Failover n Reliable & Production Quality n Much More
  • 12. Flink on YARN Client Node YARN Node YARN Node YARN ResourceManager YARN NodeManager Container Flink JobManager YARN AppMaster YARN Node YARN NodeManager Container Flink TaskManager YARN Node YARN NodeManager Container Flink TaskManager Flink YARN Client HDFS 4.allocate worker 3.allocate app master 1. store user jar and configuration 2. register resource and request app master always bootstrap containers with user jar and config
  • 13. Blink on YARN Client Node YARN Node YARN Node YARN ResourceManager YARN NodeManager Container JobMaster YARN Node YARN NodeManager YARN Node YARN NodeManager Blink Client HDFS 4.allocate worker 3.allocate app master 1. store user jar and configuration 2. register resource and request app master always bootstrap containers with user jar and config Container TaskExecutor Container TaskExecutor Container TaskExecutor Container Container TaskExecutor JobMaster 4.allocate worker
  • 14. Blink Job Architecture Yarn Node NodeManager Yarn Node NodeManager Shuffle Service Yarn Node NodeManager Shuffle Service HDFS ZooKeeper controlchannel controlchannel state backup/recover local data channel local data channel state backup/recover Container Job Master task scheduler checkpoint coordinator Container rocks db spilled file Task Executor taskin out Container rocks db spilled file Task Executor taskin out Container rocks db spilled file Task Executor taskin out Container rocks db spilled file Task Executor taskin out completed checkpoint schedule events Network data channel
  • 15. Blink Checkpoint & State TaskExecutor Local CPn Local CPn-1Incremental Backup OnComplete i1 i2 i3 Bn in queue o1 o2 Bn- 1 o3 out queue 2. hard link snapshot Job Master 1. trigger 3.ack clean up 4. complete clean up Task operator state HDFS reference async CPn CPn-1 diff State Files 1.sst 2.sst n.sst
  • 17. Blink Failover At Least Once Source Source Source Source fail restart restart failover Excactly Once Source Source Source Source fail restart failover Sink Sink Sink Sink
  • 18. Blink Metrics Job Vertex Number: [CPU, Memory] * Parallelism In Queue TPS Out Queue Latency Delay CPU Memory Task Metrics Running Tasks
  • 19. Challenges & Future n Continued Optimization in Streaming n Batch in Production n Machine Learning in Production n Larger Cluster Scale n Contribute back to Flink community
  • 20. Q & A Thank You! Xiaowei Jiang: xiaowei.jxw@alibaba-inc.com Twitter: @xiaoweij Feng Wang: jason.wang@alibaba-inc.com Twitter: @ifengwang