Publicidad
Publicidad

Más contenido relacionado

Más de taeseon ryu(20)

Último(20)

Publicidad

PaLM Scaling Language Modeling with Pathways - 230219 (1).pdf

  1. PaLM: Scaling Language Modeling with Pathways Chowdhery, Aakanksha, et al. arXiv preprint arXiv:2204.02311 2023. 02. 19 허정원, 조해창, 박산희 1
  2. Contents • • • • • 2
  3. 1. Introduction 3 Gopher LaMDA GaLM MT NLG 175B 1.2T 137B 280B 530B
  4. 1. Introduction (1) (2) (3) (4) 4 540B 780B Tokens Achieved through the use of Pathways PaLM
  5. The key takeaways • • • • • • 5
  6. Model Architecture 6
  7. 2. Model Architecture • • • • • • • 7
  8. 8
  9. 9 • SwiGLU = xW·sigmoid(βxW) @ xV An improvement in quality in compute- equivalent experiments
  10. 10 • The parallel formulation results in roughly 15% faster training speed at large scales, since the MLP and Attention input matrix multiplications can be fused.
  11. 11 • Multi-Query Attention
  12. 12 • RoPE Embeddings 𝑓! 𝑥" ≔ 𝑊 !𝑥" 𝑓# 𝑥$ + 𝑛 ≔ 𝑊#(𝑥$ + ( 𝑝% # ) 𝑓& 𝑥$ + 𝑛 ≔ 𝑊 &(𝑥$ + ( 𝑝% & )
  13. 13 • Vocabulary A SentencePiece vocabulary with 256k tokens, which was chosen to support the large number of languages in the training corpus without excess tokenization. The vocabulary is completely lossless and reversible.
  14. 2. Model Architecture • • • cost savings • • • • 14
  15. 2.1 Model Scale Hyperparameters 15
  16. Model Architecture 16
  17. Training 17
  18. 3 Training Dataset 18
  19. 4 Training Infrastructure 19
  20. 4.1 Training Efficiency 20
  21. 5 Training Setup • • • • • • • • 21
  22. 5 Training Setup • 22
  23. 5 Training Setup • 23
  24. 5 Training Setup • 24
  25. 5 Training Setup • 25
  26. 5 Training Setup • 26
  27. 5 Training Setup • • • 27
  28. 5.1 Training Instability 28
  29. Training 29
  30. Evaluation 30
  31. 31 6.1 English NLP tasks
  32. 6.2 BIG-bench 32
  33. 6.3 Reasoning 33
  34. 6.4 Code Tasks 34
  35. 6.5 Translation • • • 35
  36. 6.6 Multilingual Natural Language Generation • • • • 36
  37. 6.7 Multilingual Question Answering 37
  38. 6.8 Analysis 38
  39. Discussions 39
  40. 7 Memorization • • • 40
  41. 8 Dataset Contamination 41
  42. 9 Exploring Explanations • • • 42
  43. 10 Representational Bias Analysis 43
  44. 13 Open Questions in Scaling 44
  45. 14 Conclusion • • • 45
  46. Q & A 46
Publicidad