SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Multicore Programming

  Prepared by: Yan Druglaya
 e-mail: ydrugalya@gmail.com
     Twitter: @ydrugalya
Agenda
 Part 1 - Current state of affairs
 Part 2 - Multithreaded algorithms
 Part 3 – Task Parallel Library
Multicore Programming
Part 1: Current state of affairs
Why Moore's law is not working
anymore
 Power consumption
 Wire delays
 DRAM access latency
 Diminishing returns of more instruction-level
 parallelism
Power consumption

                                                                                             Sun‟s Surface
                        10,000


                         1,000                                                 Rocket Nozzle
Power Density (W/cm2)




                          100                                      Nuclear Reactor



                           10                                   Pentium® processors

                                                                                      Hot Plate
                            1
                                   8080
                                                                  486

                                                          386

                             „70                   „80    ‟90               ‟00                        „10
     Intel Developer Forum, Spring 2004 - Pat Gelsinger
Wire delays
DRAM access latency
Diminishing returns
 80‟s
   10 CPI  1 CPI
 90
   1 CPI  0.5CPI
 00‟s: multicore
The Free Lunch Is Over. A
Fundamental Turn Toward
Concurrency in Software

                        Herb Sutter
Survival




 To scale performance, put many processing cores on the
  microprocessor chip
 New Moore’s law edition is about doubling of cores.
Quotations
 No matter how fast processors get, software
  consistently finds new ways to eat up the extra speed
 If you haven‟t done so already, now is the time to take
  a hard look at the design of your application,
  determine what operations are CPU-sensitive now or
  are likely to become so soon, and identify how those
  places could benefit from concurrency.”
        -- Herb Sutter, C++ Architect at Microsoft (March
2005)
 After decades of single core processors, the high
  volume processor industry has gone from single to
  dual to quad-core in just the last two years. Moore‟s
  Law scaling should easily let us hit the 80-core mark
  in mainstream processors within the next ten years
  and quite possibly even less.
               -- Justin Rattner, CTO, Intel (February
2007)
What keeps us away from multicore
 Sequential way of thinking
 Believe that parallel programming is difficult and
  error-prone
 Unwilling to accept the fact that sequential era is
  over
 Neglecting performance
What have been done
 Many frameworks have been created, that brings
  parallelism at application level.
 Vendors hardly tries to teach programming
  community how to write parallel programs
 MIT and other education centers did a lot of
  researches in this area
Multicore Programming
Part 2: Multithreaded algorithms
Chapter 27 Multithreaded
Algorithms
Multithreaded algorithms
 No single architecture of parallel
  computer  no single and wide
  accepted model of parallel
  computing
 We rely on parallel shared memory
  computer
Dynamic multithreaded model(DMM)
 Allows programmer to operate with “logical
  parallelism” without worrying about any issues of
  static programming
 Two main features are:
   Nested parallelism (parent can proceed while
    spawned child is computing its result)
   Parallel loop (iteration of the loop can execute
    concurrently)
DMM - advantages
 Simple extension of “serial model”. Only 3 new
  keywords: parallel, spawn and sync.
 Provides theoretically clean way of quantify
  parallelism based on notions of “work” and
  “span”
 Many MT algorithms based on nested parallelism
  a naturally follows from divide and conquer
  approach
Multithreaded execution model
Work

Span

Speedup

Parallelism

Performance summary

Example: fib(4)
Scheduler role

Analyzing MT algorithms: Matrix
multiplication
P-Square-Matrix-Multiply:
1. n = a.rows
2. let C be new NxN matrix
3. parallel for i = 1 to n
4. parallel for j = 1 to n
5.     Cij = 0
6.     for k 1 to n
7.          Cij= Cij + Aik * B kj
Analyzing MT algorithms: Matrix
multiplication

Chess Lesson

Multicore Programming
Part 2: Task Parallel Library
TPL building blocks
 Consist of:
  - Tasks
  - Tread Safe Scalable Collections
  - Phases and Work Exchange
  - Partitioning
  - Looping
  - Control
  - Breaking
  - Exceptions
  - Results
Data parallelism




Parallel.ForEach(letters, ch => Capitalize(ch));
Task parallelism




Parallel.Invoke(() => Average(), () => Minimum()
…);
Thread Pool in .net 3.5
Thread Pool in .NET 4.0
Task Scheduler & Thread pool
 3.5 ThreadPool.QueueUserWorkItem
 disadvantages:
  Zero information about each work item
  Fairness FIFO queue maintain
 Improvements:
  More efficient FIFO queue (ConcurrentQueue)
  Enhance the API to get more information from user
    Task
    Work stealing
    Threads injections
    Wait completion, handling exceptions, getting computation
     result
New Primitives
 Thread-safe, scalable collections        AggregateException
   IProducerConsumerCollection<T>      Initialization
       ConcurrentQueue<T>                 Lazy<T>
       ConcurrentStack<T>                    LazyInitializer.EnsureInitialized<T>
       ConcurrentBag<T>                   ThreadLocal<T>
    ConcurrentDictionary<TKey,TValu
     e>
                                        Locks
                                          ManualResetEventSlim
 Phases and work exchange
                                          SemaphoreSlim
   Barrier
                                          SpinLock
   BlockingCollection<T>
                                          SpinWait
   CountdownEvent

                                        Cancellation
 Partitioning
                                          CancellationToken{Source}
   {Orderable}Partitioner<T>
       Partitioner.Create



 Exception handling
References
 The Free Lunch Is Over: A Fundamental Turn
    Toward Concurrency in Software
   MIT Introduction to algorithms video lectures
   Chapter 27 Multithreaded Algorithms from
    Introduction to algorithms 3rd edition
   CLR 4.0 ThreadPool Improvements: Part 1
   Multicore Programming Primer
   ThreadPool on Channel 9

Más contenido relacionado

La actualidad más candente

Real-time applications on IntelXeon/Phi
Real-time applications on IntelXeon/PhiReal-time applications on IntelXeon/Phi
Real-time applications on IntelXeon/PhiKarel Ha
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems
 
Training course lect1
Training course lect1Training course lect1
Training course lect1Noor Dhiya
 
Early Application experiences on Summit
Early Application experiences on Summit Early Application experiences on Summit
Early Application experiences on Summit Ganesan Narayanasamy
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computingbutest
 
High Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud Technologiesjaliyae
 
Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...jaliyae
 
Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflowKeon Kim
 
Protocol implementation on NS2
Protocol implementation on NS2Protocol implementation on NS2
Protocol implementation on NS2amreshrai02
 
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...William Nadolski
 
Virtual Network Functions as Real-Time Containers in Private Clouds
Virtual Network Functions as Real-Time Containers in Private CloudsVirtual Network Functions as Real-Time Containers in Private Clouds
Virtual Network Functions as Real-Time Containers in Private Cloudstcucinotta
 
IJCER (www.ijceronline.com) International Journal of computational Engineeri...
 IJCER (www.ijceronline.com) International Journal of computational Engineeri... IJCER (www.ijceronline.com) International Journal of computational Engineeri...
IJCER (www.ijceronline.com) International Journal of computational Engineeri...ijceronline
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...Edge AI and Vision Alliance
 

La actualidad más candente (20)

Real-time applications on IntelXeon/Phi
Real-time applications on IntelXeon/PhiReal-time applications on IntelXeon/Phi
Real-time applications on IntelXeon/Phi
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
 
Training course lect1
Training course lect1Training course lect1
Training course lect1
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
Early Application experiences on Summit
Early Application experiences on Summit Early Application experiences on Summit
Early Application experiences on Summit
 
Lec13 multidevice
Lec13 multideviceLec13 multidevice
Lec13 multidevice
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
High Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud Technologies
 
Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...
 
Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflow
 
Protocol implementation on NS2
Protocol implementation on NS2Protocol implementation on NS2
Protocol implementation on NS2
 
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
 
Lec11 timing
Lec11 timingLec11 timing
Lec11 timing
 
Lec08 optimizations
Lec08 optimizationsLec08 optimizations
Lec08 optimizations
 
Virtual Network Functions as Real-Time Containers in Private Clouds
Virtual Network Functions as Real-Time Containers in Private CloudsVirtual Network Functions as Real-Time Containers in Private Clouds
Virtual Network Functions as Real-Time Containers in Private Clouds
 
IJCER (www.ijceronline.com) International Journal of computational Engineeri...
 IJCER (www.ijceronline.com) International Journal of computational Engineeri... IJCER (www.ijceronline.com) International Journal of computational Engineeri...
IJCER (www.ijceronline.com) International Journal of computational Engineeri...
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
Parallel computation
Parallel computationParallel computation
Parallel computation
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
 

Destacado

Patterns of parallel programming
Patterns of parallel programmingPatterns of parallel programming
Patterns of parallel programmingAlex Tumanoff
 
Multi core programming 1
Multi core programming 1Multi core programming 1
Multi core programming 1Robin Aggarwal
 
Multi core programming 2
Multi core programming 2Multi core programming 2
Multi core programming 2Robin Aggarwal
 
Développeurs, c’est vous le chef !!
Développeurs, c’est vous le chef !!Développeurs, c’est vous le chef !!
Développeurs, c’est vous le chef !!Microsoft
 
Cache coherence problem and its solutions
Cache coherence problem and its solutionsCache coherence problem and its solutions
Cache coherence problem and its solutionsMajid Saleem
 
Distributed Shared Memory Systems
Distributed Shared Memory SystemsDistributed Shared Memory Systems
Distributed Shared Memory SystemsArush Nagpal
 
Cache coherence
Cache coherenceCache coherence
Cache coherenceEmployee
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memoryAshish Kumar
 

Destacado (9)

Patterns of parallel programming
Patterns of parallel programmingPatterns of parallel programming
Patterns of parallel programming
 
Multi core programming 1
Multi core programming 1Multi core programming 1
Multi core programming 1
 
Multi core programming 2
Multi core programming 2Multi core programming 2
Multi core programming 2
 
Développeurs, c’est vous le chef !!
Développeurs, c’est vous le chef !!Développeurs, c’est vous le chef !!
Développeurs, c’est vous le chef !!
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
Cache coherence problem and its solutions
Cache coherence problem and its solutionsCache coherence problem and its solutions
Cache coherence problem and its solutions
 
Distributed Shared Memory Systems
Distributed Shared Memory SystemsDistributed Shared Memory Systems
Distributed Shared Memory Systems
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
 

Similar a Multicore programmingandtpl(.net day)

[COSCUP 2022] 腳踏多條船-利用 Coroutine在 Software Transactional Memory上進行動態排程
[COSCUP 2022] 腳踏多條船-利用 Coroutine在  Software Transactional Memory上進行動態排程[COSCUP 2022] 腳踏多條船-利用 Coroutine在  Software Transactional Memory上進行動態排程
[COSCUP 2022] 腳踏多條船-利用 Coroutine在 Software Transactional Memory上進行動態排程littleuniverse24
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkDatabricks
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor DesignSri Prasanna
 
Natural Laws of Software Performance
Natural Laws of Software PerformanceNatural Laws of Software Performance
Natural Laws of Software PerformanceGibraltar Software
 
Automating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency SpreadsAutomating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency SpreadsScyllaDB
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustEvan Chan
 
IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告Ryousei Takano
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesAmazon Web Services
 
RTOS implementation
RTOS implementationRTOS implementation
RTOS implementationRajan Kumar
 
Project Energy Efficient Database Management Systems Essay
Project Energy Efficient Database Management Systems EssayProject Energy Efficient Database Management Systems Essay
Project Energy Efficient Database Management Systems EssayApril Wbnd
 
Solve the colocation conundrum: Performance and density at scale with Kubernetes
Solve the colocation conundrum: Performance and density at scale with KubernetesSolve the colocation conundrum: Performance and density at scale with Kubernetes
Solve the colocation conundrum: Performance and density at scale with KubernetesNiklas Quarfot Nielsen
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Fisnik Kraja
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performanceinside-BigData.com
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingDeepak Singh
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with MicronautQAware GmbH
 
Lllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
LllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzjLllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
LllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzjManhHoangVan
 
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...Amazon Web Services
 

Similar a Multicore programmingandtpl(.net day) (20)

[COSCUP 2022] 腳踏多條船-利用 Coroutine在 Software Transactional Memory上進行動態排程
[COSCUP 2022] 腳踏多條船-利用 Coroutine在  Software Transactional Memory上進行動態排程[COSCUP 2022] 腳踏多條船-利用 Coroutine在  Software Transactional Memory上進行動態排程
[COSCUP 2022] 腳踏多條船-利用 Coroutine在 Software Transactional Memory上進行動態排程
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor Design
 
Natural Laws of Software Performance
Natural Laws of Software PerformanceNatural Laws of Software Performance
Natural Laws of Software Performance
 
Webinaron muticoreprocessors
Webinaron muticoreprocessorsWebinaron muticoreprocessors
Webinaron muticoreprocessors
 
Automating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency SpreadsAutomating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency Spreads
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告
 
Matopt
MatoptMatopt
Matopt
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
RTOS implementation
RTOS implementationRTOS implementation
RTOS implementation
 
Project Energy Efficient Database Management Systems Essay
Project Energy Efficient Database Management Systems EssayProject Energy Efficient Database Management Systems Essay
Project Energy Efficient Database Management Systems Essay
 
Solve the colocation conundrum: Performance and density at scale with Kubernetes
Solve the colocation conundrum: Performance and density at scale with KubernetesSolve the colocation conundrum: Performance and density at scale with Kubernetes
Solve the colocation conundrum: Performance and density at scale with Kubernetes
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performance
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
 
Lllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
LllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzjLllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
Lllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
 
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
 

Multicore programmingandtpl(.net day)

  • 1. Multicore Programming Prepared by: Yan Druglaya e-mail: ydrugalya@gmail.com Twitter: @ydrugalya
  • 2. Agenda  Part 1 - Current state of affairs  Part 2 - Multithreaded algorithms  Part 3 – Task Parallel Library
  • 3. Multicore Programming Part 1: Current state of affairs
  • 4. Why Moore's law is not working anymore  Power consumption  Wire delays  DRAM access latency  Diminishing returns of more instruction-level parallelism
  • 5. Power consumption Sun‟s Surface 10,000 1,000 Rocket Nozzle Power Density (W/cm2) 100 Nuclear Reactor 10 Pentium® processors Hot Plate 1 8080 486 386 „70 „80 ‟90 ‟00 „10 Intel Developer Forum, Spring 2004 - Pat Gelsinger
  • 8. Diminishing returns  80‟s  10 CPI  1 CPI  90  1 CPI  0.5CPI  00‟s: multicore
  • 9. The Free Lunch Is Over. A Fundamental Turn Toward Concurrency in Software Herb Sutter
  • 10. Survival  To scale performance, put many processing cores on the microprocessor chip  New Moore’s law edition is about doubling of cores.
  • 11. Quotations  No matter how fast processors get, software consistently finds new ways to eat up the extra speed  If you haven‟t done so already, now is the time to take a hard look at the design of your application, determine what operations are CPU-sensitive now or are likely to become so soon, and identify how those places could benefit from concurrency.” -- Herb Sutter, C++ Architect at Microsoft (March 2005)  After decades of single core processors, the high volume processor industry has gone from single to dual to quad-core in just the last two years. Moore‟s Law scaling should easily let us hit the 80-core mark in mainstream processors within the next ten years and quite possibly even less. -- Justin Rattner, CTO, Intel (February 2007)
  • 12. What keeps us away from multicore  Sequential way of thinking  Believe that parallel programming is difficult and error-prone  Unwilling to accept the fact that sequential era is over  Neglecting performance
  • 13. What have been done  Many frameworks have been created, that brings parallelism at application level.  Vendors hardly tries to teach programming community how to write parallel programs  MIT and other education centers did a lot of researches in this area
  • 14. Multicore Programming Part 2: Multithreaded algorithms
  • 16. Multithreaded algorithms  No single architecture of parallel computer  no single and wide accepted model of parallel computing  We rely on parallel shared memory computer
  • 17. Dynamic multithreaded model(DMM)  Allows programmer to operate with “logical parallelism” without worrying about any issues of static programming  Two main features are:  Nested parallelism (parent can proceed while spawned child is computing its result)  Parallel loop (iteration of the loop can execute concurrently)
  • 18. DMM - advantages  Simple extension of “serial model”. Only 3 new keywords: parallel, spawn and sync.  Provides theoretically clean way of quantify parallelism based on notions of “work” and “span”  Many MT algorithms based on nested parallelism a naturally follows from divide and conquer approach
  • 27. Analyzing MT algorithms: Matrix multiplication P-Square-Matrix-Multiply: 1. n = a.rows 2. let C be new NxN matrix 3. parallel for i = 1 to n 4. parallel for j = 1 to n 5. Cij = 0 6. for k 1 to n 7. Cij= Cij + Aik * B kj
  • 28. Analyzing MT algorithms: Matrix multiplication 
  • 30. Multicore Programming Part 2: Task Parallel Library
  • 31. TPL building blocks  Consist of: - Tasks - Tread Safe Scalable Collections - Phases and Work Exchange - Partitioning - Looping - Control - Breaking - Exceptions - Results
  • 33. Task parallelism Parallel.Invoke(() => Average(), () => Minimum() …);
  • 34. Thread Pool in .net 3.5
  • 35. Thread Pool in .NET 4.0
  • 36. Task Scheduler & Thread pool  3.5 ThreadPool.QueueUserWorkItem disadvantages:  Zero information about each work item  Fairness FIFO queue maintain  Improvements:  More efficient FIFO queue (ConcurrentQueue)  Enhance the API to get more information from user  Task  Work stealing  Threads injections  Wait completion, handling exceptions, getting computation result
  • 37. New Primitives  Thread-safe, scalable collections  AggregateException  IProducerConsumerCollection<T>  Initialization  ConcurrentQueue<T>  Lazy<T>  ConcurrentStack<T>  LazyInitializer.EnsureInitialized<T>  ConcurrentBag<T>  ThreadLocal<T>  ConcurrentDictionary<TKey,TValu e>  Locks  ManualResetEventSlim  Phases and work exchange  SemaphoreSlim  Barrier  SpinLock  BlockingCollection<T>  SpinWait  CountdownEvent  Cancellation  Partitioning  CancellationToken{Source}  {Orderable}Partitioner<T>  Partitioner.Create  Exception handling
  • 38. References  The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software  MIT Introduction to algorithms video lectures  Chapter 27 Multithreaded Algorithms from Introduction to algorithms 3rd edition  CLR 4.0 ThreadPool Improvements: Part 1  Multicore Programming Primer  ThreadPool on Channel 9