SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
USENIX  NSDI2016
Session:  Resource  Sharing
2016-‐‑‒05-‐‑‒29  @oraccha
Co-‐‑‒located  Events
• ACM  Symposium  on  SDN  Research  2016  (SOSR),  March  13-‐‑‒17
• 2016  Open  Networking  Summit  (ONS),  March  14-‐‑‒17
• The  12th  ACM/IEEE  Symposium  on  Architectures  for  Networking  
and  Communications   Systems  (ANCSʼ’16),  March  17-‐‑‒19
• The  13th  USENIX  Symposium  on  Networked  Systems  Design  and  
Implementation  (NSDIʼ’16)  
• The  USENIX  Workshop  on  Cool  Topics  in  Sustainable  Data  
Centers  (CoolDCʼ’16),   March  19
2
Session:  Resource  Sharing
• “Ernest:  Efficient  Performance  Prediction  for  Large-‐‑‒Scale  Advanced  
Analytics,”  Shivaram Venkataraman,  Zongheng Yang,  Michael  Franklin,  
Benjamin  Recht,  and  Ion  Stoica,  University  of  California,  Berkeley
• “Cliffhanger:  Scaling  Performance  Cliffs  in  Web  Memory  Caches,”  
Asaf Cidon and  Assaf Eisenman,  Stanford  University;  Mohammad  
Alizadeh,  MIT  CSAIL;  Sachin Katti,  Stanford  University
• “FairRide:  Near-‐‑‒Optimal,  Fair  Cache  Sharing,”  Qifan Pu  and  Haoyuan
Li,  University  of  California,  Berkeley;  Matei Zaharia,  Massachusetts  
Institute  of  Technology;  Ali  Ghodsi and  Ion  Stoica,  University  of  California,  
Berkeley
• “HUG:  Multi-‐‑‒Resource  Fairness  for  Correlated  and  Elastic  Demands,”  
Mosharaf Chowdhury,  University  of  Michigan;  Zhenhua Liu,  Stony  Brook  
University;  Ali  Ghodsi and  Ion  Stoica,  University  of  California,  Berkeley,  
and  Databricks Inc.
3
Ernest:  Efficient  Performance  Prediction  for  
Large-‐‑‒Scale  Advanced  Analytics
• Who?:SparkやMesos等で知られるUCB  AMPLabの⼤大学院⽣生。⼤大規模
データ分析に対するシステムやアルゴリズムが専⾨門で、SoCC12、
EuroSys13、OSDI14、SIGMOD16等で発表あり。
• What?:クラウド環境における機械学習、ゲノム解析などのデータ分析
ワークロードを効率率率的に性能予測するフレームワークの提案
4
DO CHOICES MATTER ?
0
5
10
15
20
25
30
Time(s)
1 r3.8xlarge
2 r3.4xlarge
4 r3.2xlarge
8 r3.xlarge
16 r3.large
Matrix Multiply: 400K by 1K 
0
5
10
15
20
25
30
35
Time(s)
QR Factorization 1M by 1K 
Network Bound
 Mem Bandwidth Bound
DO CHOICES MATTER ? MATRIX MULTIPLY
10
15
20
25
30
Time(s)
1 r3.8xlarge
2 r3.4xlarge
4 r3.2xlarge
8 r3.xlarge
Matrix size: 400K by 1K 
Cores = 16
Memory = 244 GB
Cost = $2.66/hr
Cosine
Transform
Normalization
Linear Solver
~100 iterations
Iterative 
(each iteration many jobs)
Long Running à Expensive
Numerically Intensive
7
Keystone-ML TIMIT PIPELINE
Raw
Data
Properties
0
10
20
30
0
 100
 200
 300
 400
 500
 600
Time(s)
Cores
Actual
 Ideal
r3.4xlarge instances, QR Factorization:1M by 1K 
13
Do choices MATTER ?
Computation + Communication à Non-linear Scaling
Ernest:  Efficient  Performance  Prediction  for  
Large-‐‑‒Scale  Advanced  Analytics
5
• How?:⼩小規模なTraining  jobの実⾏行行結果から性能を予測。実験計画法
を使ってTraining  job数を削減。
OPTIMAL Design of EXPERIMENTS
1%
2%
4%
8%
1
 2
 4
 8
Input
Machines
Use off-the-shelf solver
(CVX)
USING ERNEST
Training
Jobs
Job
Binary
Machines,
Input Size 
Linear
Model
Experiment
Design
Use few iterations for
training
0
200
400
600
800
1000
1
 30
 900
Time
Machines
ERNEST
BASIC Model
time = x1 + x2 ∗
input
machines
+ x3 ∗ log(machines)+ x4 ∗ (machines)
Serial
Execution
Computation (linear)
Tree DAG
All-to-One DAG
Collect Training Data
 Fit Linear Regression
Ernest:  Efficient  Performance  Prediction  for  
Large-‐‑‒Scale  Advanced  Analytics
• Results:
6
TRAINING TIME: Keystone-ml
TIMIT Pipeline on r3.xlarge instances, 100 iterations
29
7 data points
Up to 16 machines
Up to 10% data
EXPERIMENT DESIGN
0
 1000
 2000
 3000
 4000
 5000
 6000
42 machines
Time (s)
Training Time
Running Time
0%
 20%
 40%
 60%
 80%
 100%
Regression
Classification
KMeans
PCA
TIMIT
Prediction Error (%)
Experiment Design
Cost-based
Is Experiment Design useful ?
30
Cliffhanger:  Scaling  Performance  Cliffs  
in  Web  Memory  Caches
• Who?:Stanford  CS出⾝身で、現在はクラウドセキュリティ会社Sookasa
のCEO(共同創業者)。クラウドストレージが専⾨門、SIGCOMM12、
USENIX  ATC13,  15で発表あり。
• What?:Performance  cliffに対する、Memcachedの動的キャッシュ割
当て機構(Slab  allocator)の改良良
70 2000 4000 6000 8000 10000 12000 14000 16000 18000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of Items in LRU Queue
Hitrate
Concave Hull
Application 19, Slab 0
Performance  Cliff,  
Talus[HPCA15]
+1  cache  hit-‐‑‒rate
↓
+35%  speedup
The  cache  hit-‐‑‒rate  of  
Facebookʼ’s  Memcached pool
is  98.2%[SIGMETRICS12]
Hit-‐‑‒rate  Curve
Cliffhanger:  Scaling  Performance  Cliffs  
in  Web  Memory  Caches
• How?:shadow  queues
– Hill  climbing  algorithm:  Hit  rate  curveの勾配の⼩小さいqueue  (slab)から⼤大
きいqueueにメモリを回す。
– Cliff  scaling  algorithm:  performance   cliff(凹区間)の始まりと終わりを⾒見見
つける。
8
Using&Shadow&Queues&to&Estimate&
Local&Gradient
823221
879
53
Queue$1
Queue$2
Physical$Queue Shadow$Queue
Physical$Queue Shadow$Queue
Credits
Queue&1 2
Queue&2 @2
1
Resize$Queues
Cliffhanger+Runs+Both+Algorithms+in+
Parallel
Par$$oned)
Original)Queue)
Par$$oned)
Queues)
Track)le4)of)pointer)
Track)le4)of)pointer)
Track)right)of)pointer)
Track)right)of)pointer)
Track)hill)climbing)
Track)hill)climbing)
• Algorithm+1:+incrementally+optimize+memory+
across+queues
– Across+slab+classes
– Across+applications
• Algorithm+2:+scales+performance+cliffs
Cliffhanger:  Scaling  Performance  Cliffs  
in  Web  Memory  Caches
• 汎⽤用に使えそうな技術。次の発表のFairRideのようなFairnessに対する
考慮はない。
9
Cliffhanger+Reduces+Misses+and+Can+
Save+Memory
• Average+misses+reduced:+36.7%
• Average+potential+memory+savings:+45%
Cliffhanger+Outperforms+Default+and+
Optimized+Schemes
• Average+Cliffhanger+hit+rate+increase:+1.2%
FairRide:  Near-‐‑‒Optimal,  Fair  Cache  
Sharing
• Who?:UCB  AMPLabの⼤大学院⽣生。MobiCom13、SIGCOMM15で発表
あり。
• What?:Isolation  guaranteeとStrategy  proofnessを満たし、Pareto  
Efficiencyを準最適にするファイルキャッシュポリシの提案。
106
… … …
Statically allocated
*
Globally shared
Cache
Backend (storage/network)
… … …
Backend (storage/network)
CacheCacheCache
What we want
Isolation
Strategy-proof
Higher utilization
Share data
Isolation
Guarantee
Strategy
Proofness
Pareto
Efficiency
✓ ✓max-min fairness ✗
priority allocation
max-min rate
✗ ✓ ✓
✓✗ ✗
static allocation ✓ ✓ ✗
Isolation
Guarantee
Strategy
Proofness
Pareto
Efficiency
106
Properties
FairRide ✓ ✓ Near-optimal
SIP定理理:ファイル共有において
下記の三つは同時に満たせない
FairRide:  Near-‐‑‒Optimal,  Fair  Cache  
Sharing
• How?
– Max-‐‑‒minポリシにProbabilistic  blockingを導⼊入することでチートに対する
dis-‐‑‒incentiveを与える。
– Alluxio (Tachyon)[SoCC14]ベースに実装。
11
LEGEND
A
C
5
5
A
B
C
5
5
10
B
A
B
C
5
5
10
true access
free-ride
cheat
blocked
Figure 3: Example with 2 users, 3 files and total cache
size of 2. Numbers represent access frequencies. (a). Al-
to get 1 hit/sec access rate for a unit file. To
mize over the utility, which is defined as the to
rate, a user’s optimal strategy is not to cache th
that one has highest access frequencies, but the
with lowest cost/(hit/sec). Compare a file of 10
shared by 2 users and another file of 100MB, share
users. Even though a user access the former 10 tim
and the latter only 8 times/sec, it is overall eco
to cache the second file (comparing 5MB/(hit/se
2.5MB/(hit/sec)).
(a)  Max-‐‑‒min  
fairness
(b)  second  user
makes  cheating
(c)  blocking  free-‐‑‒
riding  access
Probabilistic blocking
• FairRide blocks a user with p(nj) = 1/(nj+1) probability
– nj is number of other users caching file j
– e.g., p(1)=50%, p(4)=20%
• The best you can do in a general case
– Less blocking does not prevent cheating
25
FairRide:  Near-‐‑‒Optimal,  Fair  Cache  
Sharing
12
0
15
30
45
60
0 150 300 450 600 750 900 1050
missratio(%)
Time (s)
user 1
user 2
Cheating under FairRide
user 2 cheats
user 1 cheats
32
FairRide dis-incentives users from cheating.
400
300
200
100
0
Avg.response(ms)
Facebook experiments
FairRide outperforms max-min fairness by 29%
34
0
15
30
45
60
1-10 11-50 51-100 101-500 501-
RedcutioninMedian
JobTime(%)
Bin (#Tasks)
max-min
FairRide
HUG:  Multi-‐‑‒Resource  Fairness  for  
Correlated  and  Elastic  Demands
• Who?:ミシガン⼤大の助教。UCB  AMPLab出⾝身。ネットワークが専⾨門
(coflow-‐‑‒based  networking,  multi-‐‑‒resource  allocation  in  dataceters,  
compute  and  storage  for  big  data,  network  virtualization)でSIGCOMM
で毎年年のように発表。DRF[NSDI11]、FairCloud[SIGCOMM12]の発展。
• What?:ネットワーク帯域の割当て最適化問題
13
…
M1 M2 M3 MN
Congestion-Less Core
L1 L2 L3 LNLN+1 LN+2 LN+3 L2N
How to share the links
between multiple
tenants to provide
1. optimal performance
guarantees and
2. maximize utilization?
Tenant-A’s VMs
Tenant-B’s VMs
HUG:  Multi-‐‑‒Resource  Fairness  for  
Correlated  and  Elastic  Demands
• Highest  Utilization  with  the  Optimal  Isolation  Guarantee  
14
Isolation Guarantee
Utilization
Work-
Conserving
Low
Low Optimal
PS-P
DRF
Per-Flow Fairness
HUG
HUG in Cooperative Setting
1. Optimal Isolation
Guarantee
2. Work Conservation
Isolation Guarantee
Utilization
Work-
Conserving
Low
Low Optimal
PS-P
DRF
Per-Flow Fairness
HUG
1. Optimal Isolation
Guarantee
2. HighestUtilization
3. Strategyproof
HUG in Non-Cooperative Setting
Intuitively, we want to maximize the minimum
progress over all tenants, i.e., maximize mink Mk,
where mink Mk corresponds to the isolation guaran-
tee of an allocation algorithm. We make three observa-
tions. First, when there is a single link in the system,
this model trivially reduces to max-min fairness. Sec-
ond, getting more aggregate bandwidth is not always bet-
ter. For tenant-A in the example, ⟨50Mbps, 100Mbps⟩ is
better than ⟨90Mbps, 90Mbps⟩ or ⟨25Mbps, 200Mbps⟩,
even though the latter ones have more bandwidth in to-
tal. Third, simply applying max-min fairness to individ-
ual links is not enough. In our example, max-min fairness
allocates equal resources to both tenants on both links,
resulting in allocations ⟨1
2 , 1
2 ⟩ on both links (Figure 1b).
Corresponding progress (MA = MB = 1
2 ) result in a
suboptimal isolation guarantee (min{MA, MB} = 1
2 ).
Dominant Resource Fairness (DRF) [33] extends max-
min fairness to multiple resources and prevents such sub-
Cloud Network Sharing
Dynamic Sharing
Flow-Level
(Per-Flow Fairness)
No isolation guarantee
VM-Level
(Seawall, GateKeeper)
No isolation guarantee
Tenant-/Network-Level
Non-Cooperative
Environments
Require
strategy-proofness
Highest Utilization for
Optimal IsolationGuarantee
(HUG)
Cooperative
Environments
Do not require
strategy-proofness
Reservation
(SecondNet, Oktopus, Pulsar, Silo)
Uses admission control
Low
Utilization
(DRF)
Optimal isolation guarantee
Work-Conserving
Optimal Isolation Guarantee
(HUG)
Suboptimal
IsolationGuarantee
(PS-P, EyeQ, NetShare)
Work-conserving
HUG:  Multi-‐‑‒Resource  Fairness  for  
Correlated  and  Elastic  Demands
• 100台のEC2インスタンスで実験。
• 3つのテナント
– テナントA、C:pairwise  one-‐‑‒to-‐‑‒one  communication
– テナントB:all-‐‑‒to-‐‑‒all  communication
15
0
50
100
0 60 120 180 240 300 360 420 480 540
TotalAlloc(Gbps)
Time (Seconds)
Tenant A
Tenant B
Tenant C
(a) Per-flow Fairness (TCP)
0
50
100
0 60 120 180 240 300 360 420 480 540
TotalAlloc(Gbps)
Time (Seconds)
Tenant A
Tenant B
Tenant C
(b) HUG
Figure 10: [EC2] Bandwidth consumptions of three tenants arriving over time in a 100-machine EC2 cluster. Each tenant has 100
VMs, but each uses a different communication pattern (§5.1.1). We observe that (a) using TCP, tenant-B dominates the network by
creating more flows; (b) HUG isolates tenants A and C from tenant B.
感想
• 本セッションの対象はデータセンタ内の資源管理理
• ⾰革新的なアイデアがあるわけではなくが、問題をきちんと定式化し、そ
れに基づいて実⽤用的なシステムを構築するという研究のお⼿手本のような
論論⽂文が多い。さすがNSDI。
• シングルセッションで全発表を聞けるのはうれしいが、発表時間20分
は短い(スライドだけ⾒見見てもよくわからないところがある)
• UCB  AMPLab強い
• Facebook  trace  data欲しい
16
本資料料で使⽤用したすべての図はNSDI2016ホームページの
proceedingsおよびslidesから引⽤用しました。

Más contenido relacionado

La actualidad más candente

Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical ResearchBruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical ResearchDanny Abukalam
 
HPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPCHPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPCRyousei Takano
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadeaviadea
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cachergrebski
 
Stig Telfer - OpenStack and the Software-Defined SuperComputer
Stig Telfer - OpenStack and the Software-Defined SuperComputerStig Telfer - OpenStack and the Software-Defined SuperComputer
Stig Telfer - OpenStack and the Software-Defined SuperComputerDanny Abukalam
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformGanesan Narayanasamy
 
Scale-out AI Training on Massive Core System from HPC to Fabric-based SOC
Scale-out AI Training on Massive Core System from HPC to Fabric-based SOCScale-out AI Training on Massive Core System from HPC to Fabric-based SOC
Scale-out AI Training on Massive Core System from HPC to Fabric-based SOCinside-BigData.com
 
Evolving Virtual Networking with IO Visor
Evolving Virtual Networking with IO VisorEvolving Virtual Networking with IO Visor
Evolving Virtual Networking with IO VisorLarry Lang
 
유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리NAVER D2
 
dCUDA: Distributed GPU Computing with Hardware Overlap
 dCUDA: Distributed GPU Computing with Hardware Overlap dCUDA: Distributed GPU Computing with Hardware Overlap
dCUDA: Distributed GPU Computing with Hardware Overlapinside-BigData.com
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)Kohei KaiGai
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Kohei KaiGai
 
SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)Kohei KaiGai
 
Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...
Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...
Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...Cloud Native Day Tel Aviv
 
クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価Ryousei Takano
 
Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit
Slides for In-Datacenter Performance Analysis of a Tensor Processing UnitSlides for In-Datacenter Performance Analysis of a Tensor Processing Unit
Slides for In-Datacenter Performance Analysis of a Tensor Processing UnitCarlo C. del Mundo
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015Kohei KaiGai
 
Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9inside-BigData.com
 

La actualidad más candente (20)

Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical ResearchBruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
 
HPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPCHPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPC
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadea
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cache
 
Stig Telfer - OpenStack and the Software-Defined SuperComputer
Stig Telfer - OpenStack and the Software-Defined SuperComputerStig Telfer - OpenStack and the Software-Defined SuperComputer
Stig Telfer - OpenStack and the Software-Defined SuperComputer
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 
Cassandra at teads
Cassandra at teadsCassandra at teads
Cassandra at teads
 
Scale-out AI Training on Massive Core System from HPC to Fabric-based SOC
Scale-out AI Training on Massive Core System from HPC to Fabric-based SOCScale-out AI Training on Massive Core System from HPC to Fabric-based SOC
Scale-out AI Training on Massive Core System from HPC to Fabric-based SOC
 
Evolving Virtual Networking with IO Visor
Evolving Virtual Networking with IO VisorEvolving Virtual Networking with IO Visor
Evolving Virtual Networking with IO Visor
 
유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리
 
dCUDA: Distributed GPU Computing with Hardware Overlap
 dCUDA: Distributed GPU Computing with Hardware Overlap dCUDA: Distributed GPU Computing with Hardware Overlap
dCUDA: Distributed GPU Computing with Hardware Overlap
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)
 
Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...
Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...
Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...
 
クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価
 
Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit
Slides for In-Datacenter Performance Analysis of a Tensor Processing UnitSlides for In-Datacenter Performance Analysis of a Tensor Processing Unit
Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9Inside the Volta GPU Architecture and CUDA 9
Inside the Volta GPU Architecture and CUDA 9
 

Similar a USENIX NSDI 2016 (Session: Resource Sharing)

Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesAmazon Web Services
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model CompressionApache MXNet
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaAlluxio, Inc.
 
SoC Solutions Enabling Server-Based Networking
SoC Solutions Enabling Server-Based NetworkingSoC Solutions Enabling Server-Based Networking
SoC Solutions Enabling Server-Based NetworkingNetronome
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
 
Aerospike for machine learning
Aerospike for machine learningAerospike for machine learning
Aerospike for machine learningAerospike
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
IBM Special Announcement session Intel #IDF2013 September 10, 2013
IBM Special Announcement session Intel #IDF2013 September 10, 2013IBM Special Announcement session Intel #IDF2013 September 10, 2013
IBM Special Announcement session Intel #IDF2013 September 10, 2013Cliff Kinard
 
Security TechTalk | AWS Public Sector Summit 2016
Security TechTalk | AWS Public Sector Summit 2016Security TechTalk | AWS Public Sector Summit 2016
Security TechTalk | AWS Public Sector Summit 2016Amazon Web Services
 
StorPool Presents at Cloud Field Day 9
StorPool Presents at Cloud Field Day 9StorPool Presents at Cloud Field Day 9
StorPool Presents at Cloud Field Day 9StorPool Storage
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed_Hat_Storage
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrationsinside-BigData.com
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Predictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-timePredictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-timeAerospike, Inc.
 
Dynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using ContainersDynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using ContainersIRJET Journal
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 

Similar a USENIX NSDI 2016 (Session: Resource Sharing) (20)

Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
SoC Solutions Enabling Server-Based Networking
SoC Solutions Enabling Server-Based NetworkingSoC Solutions Enabling Server-Based Networking
SoC Solutions Enabling Server-Based Networking
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
Aerospike for machine learning
Aerospike for machine learningAerospike for machine learning
Aerospike for machine learning
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
IBM Special Announcement session Intel #IDF2013 September 10, 2013
IBM Special Announcement session Intel #IDF2013 September 10, 2013IBM Special Announcement session Intel #IDF2013 September 10, 2013
IBM Special Announcement session Intel #IDF2013 September 10, 2013
 
Security TechTalk | AWS Public Sector Summit 2016
Security TechTalk | AWS Public Sector Summit 2016Security TechTalk | AWS Public Sector Summit 2016
Security TechTalk | AWS Public Sector Summit 2016
 
StorPool Presents at Cloud Field Day 9
StorPool Presents at Cloud Field Day 9StorPool Presents at Cloud Field Day 9
StorPool Presents at Cloud Field Day 9
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Predictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-timePredictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-time
 
Dynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using ContainersDynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using Containers
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 

Más de Ryousei Takano

Error Permissive Computing
Error Permissive ComputingError Permissive Computing
Error Permissive ComputingRyousei Takano
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIRyousei Takano
 
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and DeploymentABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and DeploymentRyousei Takano
 
A Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center NetworksA Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center NetworksRyousei Takano
 
不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何か不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何かRyousei Takano
 
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...Ryousei Takano
 
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~Ryousei Takano
 
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green CloudRyousei Takano
 
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...Ryousei Takano
 
伸縮自在なデータセンターを実現するインタークラウド資源管理システム
伸縮自在なデータセンターを実現するインタークラウド資源管理システム伸縮自在なデータセンターを実現するインタークラウド資源管理システム
伸縮自在なデータセンターを実現するインタークラウド資源管理システムRyousei Takano
 
SoNIC: Precise Realtime Software Access and Control of Wired Networks
SoNIC: Precise Realtime Software Access and Control of Wired NetworksSoNIC: Precise Realtime Software Access and Control of Wired Networks
SoNIC: Precise Realtime Software Access and Control of Wired NetworksRyousei Takano
 
異種クラスタを跨がる仮想マシンマイグレーション機構
異種クラスタを跨がる仮想マシンマイグレーション機構異種クラスタを跨がる仮想マシンマイグレーション機構
異種クラスタを跨がる仮想マシンマイグレーション機構Ryousei Takano
 
動的ネットワーク切替を用いた省電力指向トラフィックオフロード方式
動的ネットワーク切替を用いた省電力指向トラフィックオフロード方式動的ネットワーク切替を用いた省電力指向トラフィックオフロード方式
動的ネットワーク切替を用いた省電力指向トラフィックオフロード方式Ryousei Takano
 
Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data...
Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data...Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data...
Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data...Ryousei Takano
 
インタークラウドにおける仮想インフラ構築システム
インタークラウドにおける仮想インフラ構築システムインタークラウドにおける仮想インフラ構築システム
インタークラウドにおける仮想インフラ構築システムRyousei Takano
 
Preliminary Experiment of Disaster Recovery based on Interconnect-transparent...
Preliminary Experiment of Disaster Recovery based on Interconnect-transparent...Preliminary Experiment of Disaster Recovery based on Interconnect-transparent...
Preliminary Experiment of Disaster Recovery based on Interconnect-transparent...Ryousei Takano
 
動的ネットワークパス構築と連携したエッジオーバレイ帯域制御
動的ネットワークパス構築と連携したエッジオーバレイ帯域制御動的ネットワークパス構築と連携したエッジオーバレイ帯域制御
動的ネットワークパス構築と連携したエッジオーバレイ帯域制御Ryousei Takano
 

Más de Ryousei Takano (19)

Error Permissive Computing
Error Permissive ComputingError Permissive Computing
Error Permissive Computing
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
 
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and DeploymentABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
 
ABCI Data Center
ABCI Data CenterABCI Data Center
ABCI Data Center
 
A Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center NetworksA Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center Networks
 
不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何か不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何か
 
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
 
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
 
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
 
IEEE/ACM SC2013報告
IEEE/ACM SC2013報告IEEE/ACM SC2013報告
IEEE/ACM SC2013報告
 
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
 
伸縮自在なデータセンターを実現するインタークラウド資源管理システム
伸縮自在なデータセンターを実現するインタークラウド資源管理システム伸縮自在なデータセンターを実現するインタークラウド資源管理システム
伸縮自在なデータセンターを実現するインタークラウド資源管理システム
 
SoNIC: Precise Realtime Software Access and Control of Wired Networks
SoNIC: Precise Realtime Software Access and Control of Wired NetworksSoNIC: Precise Realtime Software Access and Control of Wired Networks
SoNIC: Precise Realtime Software Access and Control of Wired Networks
 
異種クラスタを跨がる仮想マシンマイグレーション機構
異種クラスタを跨がる仮想マシンマイグレーション機構異種クラスタを跨がる仮想マシンマイグレーション機構
異種クラスタを跨がる仮想マシンマイグレーション機構
 
動的ネットワーク切替を用いた省電力指向トラフィックオフロード方式
動的ネットワーク切替を用いた省電力指向トラフィックオフロード方式動的ネットワーク切替を用いた省電力指向トラフィックオフロード方式
動的ネットワーク切替を用いた省電力指向トラフィックオフロード方式
 
Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data...
Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data...Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data...
Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data...
 
インタークラウドにおける仮想インフラ構築システム
インタークラウドにおける仮想インフラ構築システムインタークラウドにおける仮想インフラ構築システム
インタークラウドにおける仮想インフラ構築システム
 
Preliminary Experiment of Disaster Recovery based on Interconnect-transparent...
Preliminary Experiment of Disaster Recovery based on Interconnect-transparent...Preliminary Experiment of Disaster Recovery based on Interconnect-transparent...
Preliminary Experiment of Disaster Recovery based on Interconnect-transparent...
 
動的ネットワークパス構築と連携したエッジオーバレイ帯域制御
動的ネットワークパス構築と連携したエッジオーバレイ帯域制御動的ネットワークパス構築と連携したエッジオーバレイ帯域制御
動的ネットワークパス構築と連携したエッジオーバレイ帯域制御
 

Último

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Último (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

USENIX NSDI 2016 (Session: Resource Sharing)

  • 1. USENIX  NSDI2016 Session:  Resource  Sharing 2016-‐‑‒05-‐‑‒29  @oraccha
  • 2. Co-‐‑‒located  Events • ACM  Symposium  on  SDN  Research  2016  (SOSR),  March  13-‐‑‒17 • 2016  Open  Networking  Summit  (ONS),  March  14-‐‑‒17 • The  12th  ACM/IEEE  Symposium  on  Architectures  for  Networking   and  Communications   Systems  (ANCSʼ’16),  March  17-‐‑‒19 • The  13th  USENIX  Symposium  on  Networked  Systems  Design  and   Implementation  (NSDIʼ’16)   • The  USENIX  Workshop  on  Cool  Topics  in  Sustainable  Data   Centers  (CoolDCʼ’16),   March  19 2
  • 3. Session:  Resource  Sharing • “Ernest:  Efficient  Performance  Prediction  for  Large-‐‑‒Scale  Advanced   Analytics,”  Shivaram Venkataraman,  Zongheng Yang,  Michael  Franklin,   Benjamin  Recht,  and  Ion  Stoica,  University  of  California,  Berkeley • “Cliffhanger:  Scaling  Performance  Cliffs  in  Web  Memory  Caches,”   Asaf Cidon and  Assaf Eisenman,  Stanford  University;  Mohammad   Alizadeh,  MIT  CSAIL;  Sachin Katti,  Stanford  University • “FairRide:  Near-‐‑‒Optimal,  Fair  Cache  Sharing,”  Qifan Pu  and  Haoyuan Li,  University  of  California,  Berkeley;  Matei Zaharia,  Massachusetts   Institute  of  Technology;  Ali  Ghodsi and  Ion  Stoica,  University  of  California,   Berkeley • “HUG:  Multi-‐‑‒Resource  Fairness  for  Correlated  and  Elastic  Demands,”   Mosharaf Chowdhury,  University  of  Michigan;  Zhenhua Liu,  Stony  Brook   University;  Ali  Ghodsi and  Ion  Stoica,  University  of  California,  Berkeley,   and  Databricks Inc. 3
  • 4. Ernest:  Efficient  Performance  Prediction  for   Large-‐‑‒Scale  Advanced  Analytics • Who?:SparkやMesos等で知られるUCB  AMPLabの⼤大学院⽣生。⼤大規模 データ分析に対するシステムやアルゴリズムが専⾨門で、SoCC12、 EuroSys13、OSDI14、SIGMOD16等で発表あり。 • What?:クラウド環境における機械学習、ゲノム解析などのデータ分析 ワークロードを効率率率的に性能予測するフレームワークの提案 4 DO CHOICES MATTER ? 0 5 10 15 20 25 30 Time(s) 1 r3.8xlarge 2 r3.4xlarge 4 r3.2xlarge 8 r3.xlarge 16 r3.large Matrix Multiply: 400K by 1K 0 5 10 15 20 25 30 35 Time(s) QR Factorization 1M by 1K Network Bound Mem Bandwidth Bound DO CHOICES MATTER ? MATRIX MULTIPLY 10 15 20 25 30 Time(s) 1 r3.8xlarge 2 r3.4xlarge 4 r3.2xlarge 8 r3.xlarge Matrix size: 400K by 1K Cores = 16 Memory = 244 GB Cost = $2.66/hr Cosine Transform Normalization Linear Solver ~100 iterations Iterative (each iteration many jobs) Long Running à Expensive Numerically Intensive 7 Keystone-ML TIMIT PIPELINE Raw Data Properties 0 10 20 30 0 100 200 300 400 500 600 Time(s) Cores Actual Ideal r3.4xlarge instances, QR Factorization:1M by 1K 13 Do choices MATTER ? Computation + Communication à Non-linear Scaling
  • 5. Ernest:  Efficient  Performance  Prediction  for   Large-‐‑‒Scale  Advanced  Analytics 5 • How?:⼩小規模なTraining  jobの実⾏行行結果から性能を予測。実験計画法 を使ってTraining  job数を削減。 OPTIMAL Design of EXPERIMENTS 1% 2% 4% 8% 1 2 4 8 Input Machines Use off-the-shelf solver (CVX) USING ERNEST Training Jobs Job Binary Machines, Input Size Linear Model Experiment Design Use few iterations for training 0 200 400 600 800 1000 1 30 900 Time Machines ERNEST BASIC Model time = x1 + x2 ∗ input machines + x3 ∗ log(machines)+ x4 ∗ (machines) Serial Execution Computation (linear) Tree DAG All-to-One DAG Collect Training Data Fit Linear Regression
  • 6. Ernest:  Efficient  Performance  Prediction  for   Large-‐‑‒Scale  Advanced  Analytics • Results: 6 TRAINING TIME: Keystone-ml TIMIT Pipeline on r3.xlarge instances, 100 iterations 29 7 data points Up to 16 machines Up to 10% data EXPERIMENT DESIGN 0 1000 2000 3000 4000 5000 6000 42 machines Time (s) Training Time Running Time 0% 20% 40% 60% 80% 100% Regression Classification KMeans PCA TIMIT Prediction Error (%) Experiment Design Cost-based Is Experiment Design useful ? 30
  • 7. Cliffhanger:  Scaling  Performance  Cliffs   in  Web  Memory  Caches • Who?:Stanford  CS出⾝身で、現在はクラウドセキュリティ会社Sookasa のCEO(共同創業者)。クラウドストレージが専⾨門、SIGCOMM12、 USENIX  ATC13,  15で発表あり。 • What?:Performance  cliffに対する、Memcachedの動的キャッシュ割 当て機構(Slab  allocator)の改良良 70 2000 4000 6000 8000 10000 12000 14000 16000 18000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Number of Items in LRU Queue Hitrate Concave Hull Application 19, Slab 0 Performance  Cliff,   Talus[HPCA15] +1  cache  hit-‐‑‒rate ↓ +35%  speedup The  cache  hit-‐‑‒rate  of   Facebookʼ’s  Memcached pool is  98.2%[SIGMETRICS12] Hit-‐‑‒rate  Curve
  • 8. Cliffhanger:  Scaling  Performance  Cliffs   in  Web  Memory  Caches • How?:shadow  queues – Hill  climbing  algorithm:  Hit  rate  curveの勾配の⼩小さいqueue  (slab)から⼤大 きいqueueにメモリを回す。 – Cliff  scaling  algorithm:  performance   cliff(凹区間)の始まりと終わりを⾒見見 つける。 8 Using&Shadow&Queues&to&Estimate& Local&Gradient 823221 879 53 Queue$1 Queue$2 Physical$Queue Shadow$Queue Physical$Queue Shadow$Queue Credits Queue&1 2 Queue&2 @2 1 Resize$Queues Cliffhanger+Runs+Both+Algorithms+in+ Parallel Par$$oned) Original)Queue) Par$$oned) Queues) Track)le4)of)pointer) Track)le4)of)pointer) Track)right)of)pointer) Track)right)of)pointer) Track)hill)climbing) Track)hill)climbing) • Algorithm+1:+incrementally+optimize+memory+ across+queues – Across+slab+classes – Across+applications • Algorithm+2:+scales+performance+cliffs
  • 9. Cliffhanger:  Scaling  Performance  Cliffs   in  Web  Memory  Caches • 汎⽤用に使えそうな技術。次の発表のFairRideのようなFairnessに対する 考慮はない。 9 Cliffhanger+Reduces+Misses+and+Can+ Save+Memory • Average+misses+reduced:+36.7% • Average+potential+memory+savings:+45% Cliffhanger+Outperforms+Default+and+ Optimized+Schemes • Average+Cliffhanger+hit+rate+increase:+1.2%
  • 10. FairRide:  Near-‐‑‒Optimal,  Fair  Cache   Sharing • Who?:UCB  AMPLabの⼤大学院⽣生。MobiCom13、SIGCOMM15で発表 あり。 • What?:Isolation  guaranteeとStrategy  proofnessを満たし、Pareto   Efficiencyを準最適にするファイルキャッシュポリシの提案。 106 … … … Statically allocated * Globally shared Cache Backend (storage/network) … … … Backend (storage/network) CacheCacheCache What we want Isolation Strategy-proof Higher utilization Share data Isolation Guarantee Strategy Proofness Pareto Efficiency ✓ ✓max-min fairness ✗ priority allocation max-min rate ✗ ✓ ✓ ✓✗ ✗ static allocation ✓ ✓ ✗ Isolation Guarantee Strategy Proofness Pareto Efficiency 106 Properties FairRide ✓ ✓ Near-optimal SIP定理理:ファイル共有において 下記の三つは同時に満たせない
  • 11. FairRide:  Near-‐‑‒Optimal,  Fair  Cache   Sharing • How? – Max-‐‑‒minポリシにProbabilistic  blockingを導⼊入することでチートに対する dis-‐‑‒incentiveを与える。 – Alluxio (Tachyon)[SoCC14]ベースに実装。 11 LEGEND A C 5 5 A B C 5 5 10 B A B C 5 5 10 true access free-ride cheat blocked Figure 3: Example with 2 users, 3 files and total cache size of 2. Numbers represent access frequencies. (a). Al- to get 1 hit/sec access rate for a unit file. To mize over the utility, which is defined as the to rate, a user’s optimal strategy is not to cache th that one has highest access frequencies, but the with lowest cost/(hit/sec). Compare a file of 10 shared by 2 users and another file of 100MB, share users. Even though a user access the former 10 tim and the latter only 8 times/sec, it is overall eco to cache the second file (comparing 5MB/(hit/se 2.5MB/(hit/sec)). (a)  Max-‐‑‒min   fairness (b)  second  user makes  cheating (c)  blocking  free-‐‑‒ riding  access Probabilistic blocking • FairRide blocks a user with p(nj) = 1/(nj+1) probability – nj is number of other users caching file j – e.g., p(1)=50%, p(4)=20% • The best you can do in a general case – Less blocking does not prevent cheating 25
  • 12. FairRide:  Near-‐‑‒Optimal,  Fair  Cache   Sharing 12 0 15 30 45 60 0 150 300 450 600 750 900 1050 missratio(%) Time (s) user 1 user 2 Cheating under FairRide user 2 cheats user 1 cheats 32 FairRide dis-incentives users from cheating. 400 300 200 100 0 Avg.response(ms) Facebook experiments FairRide outperforms max-min fairness by 29% 34 0 15 30 45 60 1-10 11-50 51-100 101-500 501- RedcutioninMedian JobTime(%) Bin (#Tasks) max-min FairRide
  • 13. HUG:  Multi-‐‑‒Resource  Fairness  for   Correlated  and  Elastic  Demands • Who?:ミシガン⼤大の助教。UCB  AMPLab出⾝身。ネットワークが専⾨門 (coflow-‐‑‒based  networking,  multi-‐‑‒resource  allocation  in  dataceters,   compute  and  storage  for  big  data,  network  virtualization)でSIGCOMM で毎年年のように発表。DRF[NSDI11]、FairCloud[SIGCOMM12]の発展。 • What?:ネットワーク帯域の割当て最適化問題 13 … M1 M2 M3 MN Congestion-Less Core L1 L2 L3 LNLN+1 LN+2 LN+3 L2N How to share the links between multiple tenants to provide 1. optimal performance guarantees and 2. maximize utilization? Tenant-A’s VMs Tenant-B’s VMs
  • 14. HUG:  Multi-‐‑‒Resource  Fairness  for   Correlated  and  Elastic  Demands • Highest  Utilization  with  the  Optimal  Isolation  Guarantee   14 Isolation Guarantee Utilization Work- Conserving Low Low Optimal PS-P DRF Per-Flow Fairness HUG HUG in Cooperative Setting 1. Optimal Isolation Guarantee 2. Work Conservation Isolation Guarantee Utilization Work- Conserving Low Low Optimal PS-P DRF Per-Flow Fairness HUG 1. Optimal Isolation Guarantee 2. HighestUtilization 3. Strategyproof HUG in Non-Cooperative Setting Intuitively, we want to maximize the minimum progress over all tenants, i.e., maximize mink Mk, where mink Mk corresponds to the isolation guaran- tee of an allocation algorithm. We make three observa- tions. First, when there is a single link in the system, this model trivially reduces to max-min fairness. Sec- ond, getting more aggregate bandwidth is not always bet- ter. For tenant-A in the example, ⟨50Mbps, 100Mbps⟩ is better than ⟨90Mbps, 90Mbps⟩ or ⟨25Mbps, 200Mbps⟩, even though the latter ones have more bandwidth in to- tal. Third, simply applying max-min fairness to individ- ual links is not enough. In our example, max-min fairness allocates equal resources to both tenants on both links, resulting in allocations ⟨1 2 , 1 2 ⟩ on both links (Figure 1b). Corresponding progress (MA = MB = 1 2 ) result in a suboptimal isolation guarantee (min{MA, MB} = 1 2 ). Dominant Resource Fairness (DRF) [33] extends max- min fairness to multiple resources and prevents such sub- Cloud Network Sharing Dynamic Sharing Flow-Level (Per-Flow Fairness) No isolation guarantee VM-Level (Seawall, GateKeeper) No isolation guarantee Tenant-/Network-Level Non-Cooperative Environments Require strategy-proofness Highest Utilization for Optimal IsolationGuarantee (HUG) Cooperative Environments Do not require strategy-proofness Reservation (SecondNet, Oktopus, Pulsar, Silo) Uses admission control Low Utilization (DRF) Optimal isolation guarantee Work-Conserving Optimal Isolation Guarantee (HUG) Suboptimal IsolationGuarantee (PS-P, EyeQ, NetShare) Work-conserving
  • 15. HUG:  Multi-‐‑‒Resource  Fairness  for   Correlated  and  Elastic  Demands • 100台のEC2インスタンスで実験。 • 3つのテナント – テナントA、C:pairwise  one-‐‑‒to-‐‑‒one  communication – テナントB:all-‐‑‒to-‐‑‒all  communication 15 0 50 100 0 60 120 180 240 300 360 420 480 540 TotalAlloc(Gbps) Time (Seconds) Tenant A Tenant B Tenant C (a) Per-flow Fairness (TCP) 0 50 100 0 60 120 180 240 300 360 420 480 540 TotalAlloc(Gbps) Time (Seconds) Tenant A Tenant B Tenant C (b) HUG Figure 10: [EC2] Bandwidth consumptions of three tenants arriving over time in a 100-machine EC2 cluster. Each tenant has 100 VMs, but each uses a different communication pattern (§5.1.1). We observe that (a) using TCP, tenant-B dominates the network by creating more flows; (b) HUG isolates tenants A and C from tenant B.
  • 16. 感想 • 本セッションの対象はデータセンタ内の資源管理理 • ⾰革新的なアイデアがあるわけではなくが、問題をきちんと定式化し、そ れに基づいて実⽤用的なシステムを構築するという研究のお⼿手本のような 論論⽂文が多い。さすがNSDI。 • シングルセッションで全発表を聞けるのはうれしいが、発表時間20分 は短い(スライドだけ⾒見見てもよくわからないところがある) • UCB  AMPLab強い • Facebook  trace  data欲しい 16 本資料料で使⽤用したすべての図はNSDI2016ホームページの proceedingsおよびslidesから引⽤用しました。