Unifying Parallelism, Resource Allocation, and Scheduling for Multi-Large-Model Deep Learning

•

0 recomendaciones•24 vistas

Large models such as GPT-3 & ChatGPT have transformed deep learning (DL), powering applications that have captured the public’s imagination. These models are rapidly being adopted across domains for analytics on various modalities, often by finetuning pre-trained base models. Such models need multiple GPUs due to both their size and computational load, driving the development of a bevy of “model parallelism” techniques & tools. Navigating such parallelism choices, however, is a new burden for end users of DL such as data scientists, domain scientists, etc. who may lack the necessary systems knowhow. The need for model selection, which leads to many models to train due to hyper-parameter tuning or layer-wise finetuning, compounds the situation with two more burdens: resource apportioning and scheduling. In this work, we tackle these three burdens for DL users in a unified manner by formalizing them as a joint problem that we call SPASE: Select a Parallelism, Allocate resources, and Schedule. We propose a new information system architecture to tackle the SPASE problem holistically, representing a key step toward enabling wider adoption of large DL models. We devise an extensible template for existing parallelism schemes and combine it with an automated empirical profiler for runtime estimation. We then formulate SPASE as an MILP. We find that direct use of an MILP-solver is significantly more effective than several baseline heuristics.We optimize the system runtime further with an introspective scheduling approach. We implement all these techniques into a new data system we call Saturn. Experiments with benchmark DL workloads show that Saturn achieves 39-49% lower model selection runtimes than typical current DL practice.

Datos y análisis

Kabir Nagrecha & Arun Kumar
5/4/2023
Saturn
Unifying Parallelism, Resource Allocation, and Scheduling for
Multi-Large-Model Deep Learning

Agenda
• Introduction
• Deep Learning - A New Era
• Critical Challenges
• Why Unify?
• Saturn - A New Way?
• Conclusion

Fine-Tuning & Applications
• Off-the-shelf models have to be fine-tuned and adapted
• Model is big…data might not be
• Model Selection is critical - motivating multi-model
• Democratizing fine-tuning for domain scientists & practitioners

Critical Challenges - Parallelism
• Parallelism has become essential but complex
• Model Parallel?
• Pipelining?
• Offloading?
• Data Parallel / Sharded Data Parallel?
• Hybrids?

Critical Challenges - Resource Allocation
• Non-Linear Scaling Complicates Resource Apportioning
• In a multi-job, how should GPUs be distributed?
• How does each model’s performance scale?
• Local performance vs global throughput

Critical Challenges - Scheduling
• Scheduling requires both local & global understanding
• What’s the estimated runtime of each job?
• How can I most effectively utilize my GPUs to minimize makespan?

Unification
Parallelism
Conclusion: We have to join these problems!
GPU Apportioning
Scheduling

SPASE: A New Optimization Problem
• Select Parallelism Pipeline Parallel or Data Parallel?
• Allocate resources How Many GPUs per Job?
• SchedulE jobs A before B, or B before A?
Given a Multi-Job of Large Models, we have to….

Saturn - A SPASE System
1. Library
2. Profiler
3. Joint Optimizer
4. Executor
User
Parallelism Registration
Job Submission

Saturn - A SPASE System
Library: register & retrieve parallelism techniques
Already supports popular techniques such as pipelining, DDP, FSDP, and more!

Saturn - A SPASE System
Profiler: performance estimates for each model
under each parallelism & possible apportionment

Saturn - A SPASE System
Introspective Solver: MILP-solving tool to produce
parallelisms, apportionments, & start times for each
model
Pro
fi
ler Results
Hardware Information
Parallelism Selection
per Model
GPU Allocation Per
Model
Start Time Per Model

Evaluations - Background
• GPT Fine-Tuning hyperparameter selection
• 12 6B parameter models
• WikiText data
• Different learning rates, batch sizes
• Vision Transformer
• Neural Architecture Evaluation
• ImageNet
• 12 500M - 2B parameter models
• 8-GPU A100 nodes

Evaluations: Single-Node, 8-GPU
Baseline: 8-GPUs per model, run in sequence
Standard Practice
30.6 hours
Standard Practice
19.05 hours
ViT
GPT
Saturn Saturn
17.4 hours
10.75 hours
1.76X Speedup!
1.77X Speedup!

Evaluations: Two-Node, 16-GPU
Standard Practice
14.57 hours
10.15 hours
ViT
GPT
Saturn Saturn
8.23 hours
5.17 hours
1.77X Speedup!
1.96X Speedup!
Baseline: 8-GPUs per model, run in sequence
Standard Practice

Conclusion
• Modern DL Scale challenges motivate automated, easy-to-use, and
resource-efficient training systems
• We should consider DL efficiency holistically
• Saturn, the first work to tackle this new joint problem of
Parallelism, Allocation, and Scheduling demonstrates 40-50%
runtime reductions

Más contenido relacionado

Similar a Unifying Parallelism, Resource Allocation, and Scheduling for Multi-Large-Model Deep Learning

Deep Learning on Apache® Spark™: Workflows and Best PracticesDatabricks

Deep Learning on Apache® Spark™: Workflows and Best PracticesJen Aman

Machine learninginsparkMadhukara Phatak

GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleSpark Summit

Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks

Building a modern data platform with scala, akka, apache beamRaymond Tay

MongoDB for Spatio-Behavioral Data Analysis and VisualizationMongoDB

Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks

Using graphs for recommendationsRik Van Bruggen

Combining Machine Learning frameworks with Apache SparkDataWorks Summit/Hadoop Summit

Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Spark Summit

PAC 2019 virtual Alexander Podelko Neotys

Graphene – Microsoft SCOPE on Tez DataWorks Summit

WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2

Hadoop: The Default Machine Learning Platform ?Milind Bhandarkar

Challenges on Distributed Machine Learningjie cao

GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scalesparktc

Spark on MesosJen Aman

Ideas spracklen-finalsupportlogic

Writing Scalable Software in JavaRuben Badaró

Similar a Unifying Parallelism, Resource Allocation, and Scheduling for Multi-Large-Model Deep Learning (20)

Deep Learning on Apache® Spark™: Workflows and Best Practices

Machine learninginspark

GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale

Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...

Building a modern data platform with scala, akka, apache beam

MongoDB for Spatio-Behavioral Data Analysis and Visualization

Lessons Learned Replatforming A Large Machine Learning Application To Apache ...

Using graphs for recommendations

Combining Machine Learning frameworks with Apache Spark

Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)

PAC 2019 virtual Alexander Podelko

Graphene – Microsoft SCOPE on Tez

WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Hadoop: The Default Machine Learning Platform ?

Challenges on Distributed Machine Learning

GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale

Spark on Mesos

Ideas spracklen-final

Writing Scalable Software in Java

Último

2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research

GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch

Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03

20240419 - Measurecamp Amsterdam - SAM.pdfHuman37

DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster

Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen

Multiple time frame trading analysis -brianshannon.pdfchwongval

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一fhwihughh

RadioAdProWritingCinderellabyButleri.pdfgstagge

Learn How Data Science Changes Our WorldEduminds Learning

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7

Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort

1:1定制(UQ毕业证）昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk

ASML's Taxonomy Adventure by Daniel Cantervoginip

原版1:1定制南十字星大学毕业证（SCU毕业证）#文凭成绩单#真实留信学历认证永久存档208367051

INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman

Unifying Parallelism, Resource Allocation, and Scheduling for Multi-Large-Model Deep Learning

1. Kabir Nagrecha & Arun Kumar 5/4/2023 Saturn Unifying Parallelism, Resource Allocation, and Scheduling for Multi-Large-Model Deep Learning

2. Agenda • Introduction • Deep Learning - A New Era • Critical Challenges • Why Unify? • Saturn - A New Way? • Conclusion

3. Introduction (5mins)

4. A New Era • Deep Learning has changed….

5. Fine-Tuning & Applications • Off-the-shelf models have to be fine-tuned and adapted • Model is big…data might not be • Model Selection is critical - motivating multi-model • Democratizing fine-tuning for domain scientists & practitioners

6. Critical Challenges - Parallelism • Parallelism has become essential but complex • Model Parallel? • Pipelining? • Offloading? • Data Parallel / Sharded Data Parallel? • Hybrids?

7. Critical Challenges - Resource Allocation • Non-Linear Scaling Complicates Resource Apportioning • In a multi-job, how should GPUs be distributed? • How does each model’s performance scale? • Local performance vs global throughput

8. Critical Challenges - Scheduling • Scheduling requires both local & global understanding • What’s the estimated runtime of each job? • How can I most effectively utilize my GPUs to minimize makespan?

9. Unification Parallelism Conclusion: We have to join these problems! GPU Apportioning Scheduling

10. Saturn - A New Way (7mins)

11. SPASE: A New Optimization Problem • Select Parallelism Pipeline Parallel or Data Parallel? • Allocate resources How Many GPUs per Job? • SchedulE jobs A before B, or B before A? Given a Multi-Job of Large Models, we have to….

12. Saturn - A SPASE System 1. Library 2. Profiler 3. Joint Optimizer 4. Executor User Parallelism Registration Job Submission

13. Saturn - A SPASE System Library: register & retrieve parallelism techniques Already supports popular techniques such as pipelining, DDP, FSDP, and more!

14. Saturn - A SPASE System Profiler: performance estimates for each model under each parallelism & possible apportionment

15. Saturn - A SPASE System Introspective Solver: MILP-solving tool to produce parallelisms, apportionments, & start times for each model Pro fi ler Results Hardware Information Parallelism Selection per Model GPU Allocation Per Model Start Time Per Model

16. Evaluations - Background • GPT Fine-Tuning hyperparameter selection • 12 6B parameter models • WikiText data • Different learning rates, batch sizes • Vision Transformer • Neural Architecture Evaluation • ImageNet • 12 500M - 2B parameter models • 8-GPU A100 nodes

17. Evaluations: Single-Node, 8-GPU Baseline: 8-GPUs per model, run in sequence Standard Practice 30.6 hours Standard Practice 19.05 hours ViT GPT Saturn Saturn 17.4 hours 10.75 hours 1.76X Speedup! 1.77X Speedup!

18. Evaluations: Two-Node, 16-GPU Standard Practice 14.57 hours 10.15 hours ViT GPT Saturn Saturn 8.23 hours 5.17 hours 1.77X Speedup! 1.96X Speedup! Baseline: 8-GPUs per model, run in sequence Standard Practice

19. Conclusion (2mins)

20. Conclusion • Modern DL Scale challenges motivate automated, easy-to-use, and resource-efficient training systems • We should consider DL efficiency holistically • Saturn, the first work to tackle this new joint problem of Parallelism, Allocation, and Scheduling demonstrates 40-50% runtime reductions

Unifying Parallelism, Resource Allocation, and Scheduling for Multi-Large-Model Deep Learning

Recomendados

Recomendados

Más contenido relacionado

Similar a Unifying Parallelism, Resource Allocation, and Scheduling for Multi-Large-Model Deep Learning

Similar a Unifying Parallelism, Resource Allocation, and Scheduling for Multi-Large-Model Deep Learning (20)

Último

Último (20)

Unifying Parallelism, Resource Allocation, and Scheduling for Multi-Large-Model Deep Learning