SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
Apache Flink
Kubernetes Operator
THIS IS NOT A CONTRIBUTION
In a nutshell
Manage Flink Application and Session clusters/jobs

Deploy, Upgrade, Savepoint, Monitor
Intro
Why another Flink operator
•Existing OSS solutions

•Fragmentation, unreliable maintenance

•Contribution barrier

•FLIP-212: Introduce Flink Kubernetes Operator
Supported Deployment Modes
•Flink deployments in application or session mode

•Flink application managed through FlinkDeployment

•Empty Flink session managed through FlinkDeployment +
jobs managed through FlinkSessionJob

•Job submissions against a session cluster

•Foundation for the common workload types and languages
(Java, SQL, Python)
Project Status
•4 releases

•Active community with contributors from multiple
organizations



•Production ready
Acknowledgement
• Gyula Fora (gyfora)

• Matyas Orhidi (morhidi)

• Aitozi

• Marton Balassi (mbalassi)

• Yang Wang (wangyang0918)

• Nicholas Jiang (SteNicholas)

• Biao Geng (bgeng777)

• Thomas Weise (tweise)

• Hao Xin (haoxins)

• Usamah Jassat (usamj)

• James Busche (jbusche)

• Junfan Zhang (zuston)

• zeus1ammon
27 Contributors
The basics
Control Loop
Deployment Flow
1. Submit FlinkDeployment (CR)

2. Operator creates JobManager

3. JobManager creates task manager pods

4. JobManager submits job
FlinkDeployment Status
Status captures everything the operator knows about the application

What’s in it?

• JobManager Deployment Status

• Job status

• Basic Job Details

• Savepoint Info

• Reconciliation Status

• Last reconciled spec

• Success / Error information
Observing FlinkDeployment status
The Observer component is responsible for determining the status

Observe
fl
ow

1. Observe JobManager Deployment Status (exists, errors, Flink
ports)

2. Observe Job Status (Rest API accessible, job status)

3. Observe Savepoint Status (pending savepoints)

It takes a few reconcile loops to reach steady state after deployment…
Once the job is running…
• Check the Flink UI -> Metrics, Flame Graphs, Memory Utilisation

• Check the logs

• Operator log -> Reconcile / deployment errors

• JM/TM logs -> Job errors, warnings etc.

• Trigger Savepoints

• Perform upgrades

• Suspend / Resume processing
Lifecycle management
Resource Lifecycle
Upgrade / Suspend Applications
Jobs can be upgraded by simply submitting a new spec.

What happens then?

1. Suspend running job (keep state)

2. Restore using the new spec (using state from last run)

If job.state is set to SUSPENDED the job will be paused.
Application Upgrade Modes
Controls how the streaming job will be suspended and restarted on
spec changes

Available modes

• Stateless 

• Last State

• Savepoint
Application Upgrade Modes
Stateless Last State Savepoint
Con
fi
g Requirement None
• Checkpointing Enabled

• Kubernetes HA Enabled
• Savepoint directory
de
fi
ned
Job Status Requirement None* None*
Deployment healthy

Job Running
Suspend Mechanism Cancel / Delete
Delete Flink deployment
(keep HA data)
Cancel with savepoint
Restore Mechanism Deploy from empty state
Deploy job -> recover state
from HA data
Restore From savepoint
When to use? Stateless jobs, prototyping Most stateful jobs Job Migration / Forking
* No savepoint in progress
Manual Savepoints
•
Allows you to keep “backups” of your application state

Trigger by changing job.savepointTriggerNonce


•
Use job.initialSavepointPath to start from a speci
fi
c savepoint
on new deployments

•
Savepoints are cleaned up automatically
Automatic Savepoint Management
•Periodic savepoints

•Con
fi
g: kubernetes.operator.periodic.savepoint.interval

•Savepoints triggered as part of regular reconcile loop

•Savepoint history

•Count and age based

•Disposal of savepoints
Configuration
Zero Downtime Changes
Initial con
fi
guration through helm chart

•How to apply changes?

Dynamic changes without control plane interruption

•Clusters with many concurrent reconciliations

•Reload operator con
fi
guration from con
fi
g map

Namespaces to watch

•List of namespaces + dynamic con
fi
g change
Operator (System) Level
•General operator wide con
fi
guration

•Cannot be overridden on a per-resource basis

Examples:

•Timeout for the observer to wait the Flink REST client to return

•Interval for the controller to reschedule the reconcile process

•Maximum number of threads running the reconciliation loop

https://nightlies.apache.org/
fl
ink/
fl
ink-kubernetes-operator-docs-main/docs/operations/con
fi
guration/#system-con
fi
guration
Resource (User) Level
•Settings that a
ff
ect a single deployment

•“Extend” the CR (but don’t require CRD changes!)

Examples:

•Enable recovery of missing/deleted jobmanager deployments

•Timeout for deployments to become ready/stable before being rolled
back

•Interval before a savepoint trigger attempt is marked as unsuccessful

https://nightlies.apache.org/
fl
ink/
fl
ink-kubernetes-operator-docs-main/docs/operations/con
fi
guration/#resourceuser-con
fi
guration
Flink Pod Templates
CR with limited direct settings (like memory and cpu resource)

Maximum
fl
exibility through
fl
inkCon
fi
guration and pod templates (init, sidecar, storage etc.)



Common template with job/task manager override

https://nightlies.apache.org/
fl
ink/
fl
ink-kubernetes-operator-docs-main/docs/custom-resource/pod-template/
Observability
Metrics
Flink metric reporters (default reporters shipped with image)

•Prometheus (example)



Metric scopes

•Operator: JVM, k8s client metrics

•Watched Namespace: Deployment count, Status count

•Resource: Reconciliation count, .. (JOSDK metrics)
Logging
Con
fi
gured via Con
fi
gMap (default in helm chart)

Kubernetes Events
Important changes (and errors) recorded as events

kubectl describe flinkdeployment basic-example




Events can be forwarded to infrastructure speci
fi
c collectors
When things go wrong
Error Scenarios
Typical causes
•Operator service account access problem

•Invalid Flink deployment con
fi
guration

•Operator failure / bug

Where to look
•Operator log

•Service accounts / roles / role bindings

FlinkDeployment Not Created
Error Scenarios
Typical causes
•Flink service account access problem

•Flink image pull error

•Pod template / other Kubernetes issues

Where to look
•FlinkDeployment (CR) events

•Describe JobManager replicaset
JobManager pod not created
Error Scenarios
Typical causes
•Flink service account access problem

•TM pod template problems

•Insu
ffi
cient resources

Where to look
•JobManager pod logs

•Describe pending task manager pod
TaskManager pods not ready
Roadmap
Roadmap
•Version 1.1

•Dynamic change of watched namespaces

•Pluggable Status and Event reporters (integration point for control planes)

•Improved savepoint management & periodic triggering

•Experimental auto-scaling using Horizontal Pod AutoScaler

•Version 1.2

•Standalone deployment mode support (FLIP-225, support for older Flink versions)

•Improved scaling and autoscaling support

•Improved rollback mechanism

•Roadmap documentation page
Q&A

Más contenido relacionado

La actualidad más candente

Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufVerverica
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database Systemconfluent
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleFlink Forward
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanVerverica
 
Introduction To Flink
Introduction To FlinkIntroduction To Flink
Introduction To FlinkKnoldus Inc.
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkFlink Forward
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkVerverica
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberFlink Forward
 

La actualidad más candente (20)

Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
Introduction To Flink
Introduction To FlinkIntroduction To Flink
Introduction To Flink
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 

Similar a Introducing the Apache Flink Kubernetes Operator

Enhanced Reframework Session_16-07-2022.pptx
Enhanced Reframework Session_16-07-2022.pptxEnhanced Reframework Session_16-07-2022.pptx
Enhanced Reframework Session_16-07-2022.pptxRohit Radhakrishnan
 
python_development.pptx
python_development.pptxpython_development.pptx
python_development.pptxLemonReddy1
 
SynapseIndia drupal presentation on drupal info
SynapseIndia drupal  presentation on drupal infoSynapseIndia drupal  presentation on drupal info
SynapseIndia drupal presentation on drupal infoSynapseindiappsdevelopment
 
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoStorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoAlluxio, Inc.
 
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...Amazon Web Services
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio, Inc.
 
Using The Right Tool For The Job
Using The Right Tool For The JobUsing The Right Tool For The Job
Using The Right Tool For The JobChris Baldock
 
Index Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for SuccessIndex Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for SuccessDean Willson
 
Life In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPagesLife In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPagesUlrich Krause
 
La vita nella corsia di sorpasso; A tutta velocità, XPages!
La vita nella corsia di sorpasso; A tutta velocità, XPages!La vita nella corsia di sorpasso; A tutta velocità, XPages!
La vita nella corsia di sorpasso; A tutta velocità, XPages!Ulrich Krause
 
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...garrett honeycutt
 
Anatomy of Autoconfig in Oracle E-Business Suite
Anatomy of Autoconfig in Oracle E-Business SuiteAnatomy of Autoconfig in Oracle E-Business Suite
Anatomy of Autoconfig in Oracle E-Business Suitevasuballa
 
Alfresco Business Reporting - Tech Talk Live 20130501
Alfresco Business Reporting - Tech Talk Live 20130501Alfresco Business Reporting - Tech Talk Live 20130501
Alfresco Business Reporting - Tech Talk Live 20130501Tjarda Peelen
 
Apex world 2018 continuously delivering APEX
Apex world 2018 continuously delivering APEXApex world 2018 continuously delivering APEX
Apex world 2018 continuously delivering APEXSergei Martens
 
APEX Application Lifecycle and Deployment 20220714.pdf
APEX Application Lifecycle and Deployment 20220714.pdfAPEX Application Lifecycle and Deployment 20220714.pdf
APEX Application Lifecycle and Deployment 20220714.pdfRichard Martens
 
Actions rules and workflow in alfresco
Actions rules and workflow in alfrescoActions rules and workflow in alfresco
Actions rules and workflow in alfrescoAlfresco Software
 
Live Container Migration: OpenStack Summit Barcelona 2016
Live Container Migration: OpenStack Summit Barcelona 2016Live Container Migration: OpenStack Summit Barcelona 2016
Live Container Migration: OpenStack Summit Barcelona 2016Phil Estes
 
Oracle EBS Production Support - Recommendations
Oracle EBS Production Support - RecommendationsOracle EBS Production Support - Recommendations
Oracle EBS Production Support - RecommendationsVigilant Technologies
 
Tuenti Release Workflow
Tuenti Release WorkflowTuenti Release Workflow
Tuenti Release WorkflowTuenti
 

Similar a Introducing the Apache Flink Kubernetes Operator (20)

Enhanced Reframework Session_16-07-2022.pptx
Enhanced Reframework Session_16-07-2022.pptxEnhanced Reframework Session_16-07-2022.pptx
Enhanced Reframework Session_16-07-2022.pptx
 
python_development.pptx
python_development.pptxpython_development.pptx
python_development.pptx
 
SynapseIndia drupal presentation on drupal info
SynapseIndia drupal  presentation on drupal infoSynapseIndia drupal  presentation on drupal info
SynapseIndia drupal presentation on drupal info
 
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoStorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
 
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...Give your little scripts big wings:  Using cron in the cloud with Amazon Simp...
Give your little scripts big wings: Using cron in the cloud with Amazon Simp...
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Using The Right Tool For The Job
Using The Right Tool For The JobUsing The Right Tool For The Job
Using The Right Tool For The Job
 
Index Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for SuccessIndex Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for Success
 
Life In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPagesLife In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPages
 
Serverless design with Fn project
Serverless design with Fn projectServerless design with Fn project
Serverless design with Fn project
 
La vita nella corsia di sorpasso; A tutta velocità, XPages!
La vita nella corsia di sorpasso; A tutta velocità, XPages!La vita nella corsia di sorpasso; A tutta velocità, XPages!
La vita nella corsia di sorpasso; A tutta velocità, XPages!
 
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...
20111110 how puppet-fits_into_your_existing_infrastructure_and_change_managem...
 
Anatomy of Autoconfig in Oracle E-Business Suite
Anatomy of Autoconfig in Oracle E-Business SuiteAnatomy of Autoconfig in Oracle E-Business Suite
Anatomy of Autoconfig in Oracle E-Business Suite
 
Alfresco Business Reporting - Tech Talk Live 20130501
Alfresco Business Reporting - Tech Talk Live 20130501Alfresco Business Reporting - Tech Talk Live 20130501
Alfresco Business Reporting - Tech Talk Live 20130501
 
Apex world 2018 continuously delivering APEX
Apex world 2018 continuously delivering APEXApex world 2018 continuously delivering APEX
Apex world 2018 continuously delivering APEX
 
APEX Application Lifecycle and Deployment 20220714.pdf
APEX Application Lifecycle and Deployment 20220714.pdfAPEX Application Lifecycle and Deployment 20220714.pdf
APEX Application Lifecycle and Deployment 20220714.pdf
 
Actions rules and workflow in alfresco
Actions rules and workflow in alfrescoActions rules and workflow in alfresco
Actions rules and workflow in alfresco
 
Live Container Migration: OpenStack Summit Barcelona 2016
Live Container Migration: OpenStack Summit Barcelona 2016Live Container Migration: OpenStack Summit Barcelona 2016
Live Container Migration: OpenStack Summit Barcelona 2016
 
Oracle EBS Production Support - Recommendations
Oracle EBS Production Support - RecommendationsOracle EBS Production Support - Recommendations
Oracle EBS Production Support - Recommendations
 
Tuenti Release Workflow
Tuenti Release WorkflowTuenti Release Workflow
Tuenti Release Workflow
 

Más de Flink Forward

One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!Flink Forward
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitFlink Forward
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionFlink Forward
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Flink Forward
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 

Más de Flink Forward (11)

One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 

Último

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Último (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

Introducing the Apache Flink Kubernetes Operator

  • 2. In a nutshell Manage Flink Application and Session clusters/jobs Deploy, Upgrade, Savepoint, Monitor
  • 4. Why another Flink operator •Existing OSS solutions •Fragmentation, unreliable maintenance •Contribution barrier •FLIP-212: Introduce Flink Kubernetes Operator
  • 5. Supported Deployment Modes •Flink deployments in application or session mode •Flink application managed through FlinkDeployment •Empty Flink session managed through FlinkDeployment + jobs managed through FlinkSessionJob •Job submissions against a session cluster •Foundation for the common workload types and languages (Java, SQL, Python)
  • 6. Project Status •4 releases •Active community with contributors from multiple organizations
 •Production ready
  • 7. Acknowledgement • Gyula Fora (gyfora) • Matyas Orhidi (morhidi) • Aitozi • Marton Balassi (mbalassi) • Yang Wang (wangyang0918) • Nicholas Jiang (SteNicholas) • Biao Geng (bgeng777) • Thomas Weise (tweise) • Hao Xin (haoxins) • Usamah Jassat (usamj) • James Busche (jbusche) • Junfan Zhang (zuston) • zeus1ammon 27 Contributors
  • 10. Deployment Flow 1. Submit FlinkDeployment (CR) 2. Operator creates JobManager 3. JobManager creates task manager pods 4. JobManager submits job
  • 11. FlinkDeployment Status Status captures everything the operator knows about the application What’s in it? • JobManager Deployment Status • Job status • Basic Job Details • Savepoint Info • Reconciliation Status • Last reconciled spec • Success / Error information
  • 12. Observing FlinkDeployment status The Observer component is responsible for determining the status Observe fl ow 1. Observe JobManager Deployment Status (exists, errors, Flink ports) 2. Observe Job Status (Rest API accessible, job status) 3. Observe Savepoint Status (pending savepoints) It takes a few reconcile loops to reach steady state after deployment…
  • 13. Once the job is running… • Check the Flink UI -> Metrics, Flame Graphs, Memory Utilisation • Check the logs • Operator log -> Reconcile / deployment errors • JM/TM logs -> Job errors, warnings etc. • Trigger Savepoints • Perform upgrades • Suspend / Resume processing
  • 14.
  • 17. Upgrade / Suspend Applications Jobs can be upgraded by simply submitting a new spec. What happens then? 1. Suspend running job (keep state) 2. Restore using the new spec (using state from last run) If job.state is set to SUSPENDED the job will be paused.
  • 18. Application Upgrade Modes Controls how the streaming job will be suspended and restarted on spec changes Available modes • Stateless • Last State • Savepoint
  • 19. Application Upgrade Modes Stateless Last State Savepoint Con fi g Requirement None • Checkpointing Enabled • Kubernetes HA Enabled • Savepoint directory de fi ned Job Status Requirement None* None* Deployment healthy Job Running Suspend Mechanism Cancel / Delete Delete Flink deployment (keep HA data) Cancel with savepoint Restore Mechanism Deploy from empty state Deploy job -> recover state from HA data Restore From savepoint When to use? Stateless jobs, prototyping Most stateful jobs Job Migration / Forking * No savepoint in progress
  • 20. Manual Savepoints • Allows you to keep “backups” of your application state Trigger by changing job.savepointTriggerNonce 
 • Use job.initialSavepointPath to start from a speci fi c savepoint on new deployments
 • Savepoints are cleaned up automatically
  • 21. Automatic Savepoint Management •Periodic savepoints •Con fi g: kubernetes.operator.periodic.savepoint.interval •Savepoints triggered as part of regular reconcile loop •Savepoint history •Count and age based •Disposal of savepoints
  • 23. Zero Downtime Changes Initial con fi guration through helm chart •How to apply changes? Dynamic changes without control plane interruption •Clusters with many concurrent reconciliations •Reload operator con fi guration from con fi g map Namespaces to watch •List of namespaces + dynamic con fi g change
  • 24. Operator (System) Level •General operator wide con fi guration •Cannot be overridden on a per-resource basis Examples: •Timeout for the observer to wait the Flink REST client to return •Interval for the controller to reschedule the reconcile process •Maximum number of threads running the reconciliation loop https://nightlies.apache.org/ fl ink/ fl ink-kubernetes-operator-docs-main/docs/operations/con fi guration/#system-con fi guration
  • 25. Resource (User) Level •Settings that a ff ect a single deployment •“Extend” the CR (but don’t require CRD changes!) Examples: •Enable recovery of missing/deleted jobmanager deployments •Timeout for deployments to become ready/stable before being rolled back •Interval before a savepoint trigger attempt is marked as unsuccessful https://nightlies.apache.org/ fl ink/ fl ink-kubernetes-operator-docs-main/docs/operations/con fi guration/#resourceuser-con fi guration
  • 26. Flink Pod Templates CR with limited direct settings (like memory and cpu resource) Maximum fl exibility through fl inkCon fi guration and pod templates (init, sidecar, storage etc.) Common template with job/task manager override https://nightlies.apache.org/ fl ink/ fl ink-kubernetes-operator-docs-main/docs/custom-resource/pod-template/
  • 28. Metrics Flink metric reporters (default reporters shipped with image) •Prometheus (example)
 Metric scopes •Operator: JVM, k8s client metrics •Watched Namespace: Deployment count, Status count •Resource: Reconciliation count, .. (JOSDK metrics)
  • 29. Logging Con fi gured via Con fi gMap (default in helm chart)

  • 30. Kubernetes Events Important changes (and errors) recorded as events kubectl describe flinkdeployment basic-example Events can be forwarded to infrastructure speci fi c collectors
  • 31. When things go wrong
  • 32. Error Scenarios Typical causes •Operator service account access problem •Invalid Flink deployment con fi guration •Operator failure / bug Where to look •Operator log •Service accounts / roles / role bindings FlinkDeployment Not Created
  • 33. Error Scenarios Typical causes •Flink service account access problem •Flink image pull error •Pod template / other Kubernetes issues Where to look •FlinkDeployment (CR) events •Describe JobManager replicaset JobManager pod not created
  • 34. Error Scenarios Typical causes •Flink service account access problem •TM pod template problems •Insu ffi cient resources Where to look •JobManager pod logs •Describe pending task manager pod TaskManager pods not ready
  • 36. Roadmap •Version 1.1 •Dynamic change of watched namespaces •Pluggable Status and Event reporters (integration point for control planes) •Improved savepoint management & periodic triggering •Experimental auto-scaling using Horizontal Pod AutoScaler •Version 1.2 •Standalone deployment mode support (FLIP-225, support for older Flink versions) •Improved scaling and autoscaling support •Improved rollback mechanism •Roadmap documentation page
  • 37. Q&A