9. 9
@everpeace
● 基盤系キーワード
○ Volcano(旧 kube-batch)
○ Kubernetes Batch Working Group
■ Kueue
● 事例はあまり多くない
○ HPC
■ Kubernetes as a Substrate for ATLAS Compute (Univ. of Texas, TU München)
■ KubeFlux: An HPC Scheduler Plugin for Kubernetes (IBM, LLNL)
○ Batch
■ Spark on Kubernetes: The Elastic Story (Apple)
■ Supporting Long-Lived Pods Using a Simple Kubernetes Webhook (Slack)
● Scheduler拡張系結構多い→このあと特集します
Batch/HPC on Kubernetes 最新潮流
10. 10
@everpeace
● 基盤系
○ Volcano: Intro & Deep Dive (Huawei)
○ Introduction to the Kubernetes WG Batch (Google, Alibaba)
○ Kueue: A Kubernetes-native Job Queueing (Google)
● 事例系
○ Kubernetes as a Substrate for ATLAS Compute (CERN)
○ KubeFlux: An HPC Scheduler Plugin for Kubernetes (IBM, Lawrence
Livermore National Laboratory)
Selected Sessions: Batch/HPC on k8s 最新潮流
21. 21
@everpeace
Kubernetes as a Substrate for ATLAS Compute
ATLASはCERNの大型ハドロン衝突型加速器にある素粒子物理実験装置
全体で600PBytesのデータ
700K+ vCPUs(一部クラウド有)
2020年に始めたMiniK8s Gridは現在は
Googleでバーストさせてトータル 100k vCPU
22. 22
@everpeace
Kubernetes as a Substrate for ATLAS Compute
CernVMFS
PanDA
Production and
Distributed Analysis
Jupyter+Daskの部分は
デモもあったので是非
ビデオ見てください!
23. 23
@everpeace
KubeFlux: An HPC Scheduler Plugin for Kubernetes
Lawrence Livermore National Laboratoryの
ElCapitan (2023予定) は >2 exaFLOPS!!
(富岳は442 PFLOPS)
※現行設備は言及なし
紹介されたユースケースは生物系が多い
10%くらいしかcloud利用していないが
今後増えていく予定
26. 26
@everpeace
● Batch/HPCで登場したセッション
○ Working your Cluster: Smarter Scheduling Decisions for Your
Workloads (Intel)
→ Telemetry Aware Scheduling (Custom Metrics API連携)
○ Resource Orchestration of HPC on Kubernetes: Where We Are Now
and the Journey Ahead! (RedHat) → NUMA Aware Scheduling
○ KubeFlux: An HPC Scheduler Plugin for Kubernetes (IBM, LLNL)
→ HPC Scheduler & kube-scheduler連携
● 純粋にScheduler拡張系のセッション
○ Network-aware Scheduling in Kubernetes (Ghent University)
→Infrastructure Topology & Network Aware Scheduling
Selected Sessions: Scheduler最新拡張事例
27. 27
@everpeace
Telemetry Aware Scheduling
Working your Cluster: Smarter Scheduling Decisions for Your Workloads
Nodeメトリクスを
カスタムメトリクス
APIでexposeする
Scheduler Extender
として動作してPodの
TAS Policyをenforce
TAS Policy CR
(Telemetry Aware
Scheduling Policy)
28. 28
@everpeace
Telemetry Aware Scheduling
Working your Cluster: Smarter Scheduling Decisions for Your Workloads
dontschedule strategy:
health_metric メトリクスが1なNodeにはscheduleしない
scheduleonmetric strategy:
temperature メトリクスが少ないNodeにスケジュールされる
labeling strategy:
memory_used_card0メトリクスが100を超えたら card0=trueって
いうnode labelを付与
deschedule stragety:
tempertureメトリクスが80を超えたらdeschedule
freeRAMメトリクスが200を切ったらdeschedule
29. 29
@everpeace
NUMA Aware Scheduling
Resource Orchestration of HPC on Kubernetes: Where We Are Now and the Journey Ahead!
kube-schedulerはNodeのNUMA利用状況を
知らない
→ Topology Manager Policyがきついと
PodがScheduleされてもErrorで全然
上がらない
30. 30
@everpeace
NUMA Aware Scheduling
Resource Orchestration of HPC on Kubernetes: Where We Are Now and the Journey Ahead!
KubeletのPodResource APIを使って
resourceのassign状況を
NodeResourceTopology CRにexpose
Scheduler Pluginで
NodeResourceTopology CR
を見てschedule判断
Node毎に生成される
zone: NUMA, socket, die, etc.
cost: zone間の距離を表す指標
31. 31
@everpeace
HPC Scheduler & kube-scheduler連携
KubeFlux: An HPC Scheduler Plugin for Kubernetes
コレまでのnode-centricなmodelは
● monogenousな環境向け
● Heterogeneousな環境だと効率悪い
● リソースの包含関係をグラフとして表現
● リッチなグラフtraversal/allocaiton API
● 複雑なスケジューリングをcodeを
変更せずに実現可能
※SIG-Schedulingのsubprojectだった
Poseidonと少し違う感じがするが
詳細不明
39. 39
@everpeace
● Batch/HPC基盤系
○ [Keynote] High Performance Computing on Google Kubernetes Engine(Google)
○ Kueue: A Kubernetes-native Job Queueing (Google)
○ Volcano – Cloud Native Batch System for AI, BigData and HPC (Huawei)
○ Fast Data on-Ramp with Apache Pulsar on K8 (StreamNative)
○ Efficient Deep Learning Training with Ludwig AutoML, Ray, and Nodeless Kubernetes
(Elotl, Predibase)
● HPC系事例
○ [LT] How to Handle Fair Scheduling in a Private Academic K8s infrastructure (Masaryk
University, CESNET)
● Scheduler系
○ Resource Orchestration of HPC on Kubernetes: Where We Are Now and the Journey
Ahead! (RedHat)
○ Get More Computing Power by Helping the OS Scheduler (Intel)
○ Apache YuniKorn A Kubernetes Scheduler Plugin for Batch Workloads(Cloudera)
[参考] Kubernetes Batch + HPC Day
40. 40
@everpeace
● Batch/HPC基盤系
○ Volcano: Intro & Deep Dive (Huawei)
○ Introduction to the Kubernetes WG Batch (Google, Alibaba)
○ Unlimited Data Science Libraries, One Container Image, No Installation! (Red Hat, Ghent Univ.)
○ [LT]Secure Multi User HPC Jobs in Kubernetes with Kyverno (Ohio Supercomputer Center)
● Batch系事例
○ Spark on Kubernetes: The Elastic Story (Apple)
○ Supporting Long-Lived Pods Using a Simple Kubernetes Webhook (Slack)
● HPC系事例
○ Kubernetes as a Substrate for ATLAS Compute (CERN)
○ KubeFlux: An HPC Scheduler Plugin for Kubernetes (IBM, LLNL)
● Scheduler系
○ Working your Cluster: Smarter Scheduling Decisions for Your Workloads (Intel)
○ KubeFlux: An HPC Scheduler Plugin for Kubernetes (IBM, LLNL)
[参考]KubeCon + CloudNativeCon (Batch/HPC系)