Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

U-SQL Query Execution and Performance Basics (SQLBits 2016)

956 visualizaciones

Publicado el

U-SQL Query Execution and Performance Basics (SQLBits 2016 ADL/U-SQL Pre-Conference)

Publicado en: Datos y análisis
  • Inicia sesión para ver los comentarios

U-SQL Query Execution and Performance Basics (SQLBits 2016)

  1. 1. Michael Rys Principal Program Manager, Big Data @ Microsoft @MikeDoesBigData, {mrys, usql}@microsoft.com U-SQL Query Execution and Performance Basics
  2. 2. Simplified U-SQL Job Workflow Job Front End Job Scheduler Compiler Service Job Queue Job Manager U-SQL Catalog YARN Job submission Job execution U-SQL Runtime Vertex execution
  3. 3. U-SQL Compilation Process C# C++ Algebra Other files (system files, deployed resources) managed dll Unmanaged dll Compilation output (in job folder) Compiler & Optimizer U-SQL Metadata Service Deployed to Vertices
  4. 4. Job Status in Visual Studio
  5. 5. Preparing Queued Running Finalizing Ended (Succeeded, Failed, Cancelled) New Compiling Queued Scheduling Starting Running Ended UX Job State The script is being compiled by the Compiler Service All jobs enter the queue. Are there enough ADLAUs to start the job? If yes, then allocate those ADLAUs for the job The U-SQL runtime is now executing the code on 1 or more ADLAUs or finalizing the outputs The job has concluded.
  6. 6. The Job Queue The queue is ordered by job priority. Lower numbers -> higher priority. 1 = highest. Running jobs When a job is at the top of the queue, it will start running. Defaults: Max Running Jobs = 3 Max Tokens per job = 20 Max Queue Size = 200
  7. 7. Priority Doesn’t Preempt Running Jobs X has Pri=1. X A B C X will NOT preempt running jobs. X will have to wait. These are all running and have very low priority (pri=1000)
  8. 8. Resources
  9. 9. Blue items: the output of the compiler Grey items: U-SQL runtime bits Download all the resources Download a specific resource
  10. 10. The Job Folder Inside the Default ADL Store: /system/jobservice/jobs/Usql/YYYY/MM/DD/hh/mm/JOBID /system/jobservice/jobs/Usql/2016/01/20/00/00/17972fc2-4737-48f7-81fb-49af9a784f64
  11. 11. Query Execution Plans, Vertices, Stages, Parallelism, ADLAUs
  12. 12. Job Scheduler & Queue Front-EndService 13 Optimizer Vertex Scheduling Compiler Runtime Visual Studio Portal / API Query Life
  13. 13. Parallelism 100 (ADLAUs) Work composed of 12K Vertices
  14. 14. U-SQL Script -> Job Graph Logical -> Physical Plan Each square = “a vertex” represents a fraction of the total Vertexes in each SuperVertex (aka “Stage) are doing the same operation on a different part of the same data. Visualized as a “Job Graph”
  15. 15. ADLAUs Azure Data Lake Analytics Unit Parallelism N = N ADLAUs 1 ADLAU ~= A VM with 2 cores and 6 GB of memory
  16. 16. Execution with Requested Parallelism Requested Parallelism = 1 (reserve enough to do 1 vertex at a time) Requested Parallelism = 4 (reserve enough to do 4 vertices at a time)
  17. 17. Notes The next stage can start before the previous one has finished It may not be possible to use all the reserved parallelism during a Stage
  18. 18. Notes The Job Resources are copied to each vertex JOB RESOURCES
  19. 19. Stage Details 252 Pieces of work AVG Vertex execution time 4.3 Billion rows Data Read & Written Super Vertex = Stage
  20. 20. Automatic Vertex retry ORANGE: A vertex failed … but was retried automatically Overall Stage Completed Successfully
  21. 21. Vertex Execution View
  22. 22. All the vertexes Filter which vertexes to see
  23. 23. The Critical Path
  24. 24. Vertex Relationships The vertex on the bottom depends on the output of the vertex in the top
  25. 25. Critical Path The dependency chain of vertexes that kept the job running to the very end.
  26. 26. Efficiency Cost vs Latency
  27. 27. 𝐽𝑜𝑏𝐶𝑜𝑠𝑡 = 5𝑐 + 𝑚𝑖𝑛𝑢𝑡𝑒𝑠 × 𝐴𝐷𝐿𝑈𝐴𝑠 × 𝐴𝐷𝐿𝐴𝑈𝑐𝑜𝑠𝑡𝑝𝑒𝑟𝑚𝑖𝑛
  28. 28. Allocation Allocating 10 ADLAUs for a 10 minute job. Cost = 10 min * 10 ADLAUs = 100 ADLAU minutes Time Blue line: Allocated
  29. 29. Over Allocation Consider using fewer ADLAUs You are paying for the area under the blue line You are only using the area under the red line Time
  30. 30. Profile isn’t loaded
  31. 31. Profile is loaded now Click Resource usage
  32. 32. Blue: Allocation Red: Actual running
  33. 33. Dips down to 1 active vertex at these times
  34. 34. Smallest estimated time when given 2425 ADLAUs 1410 seconds = 23.5 minutes
  35. 35. Model with 100 ADLAUs 8709 seconds = 145.5 minutes
  36. 36. http://aka.ms/AzureDataLake

×