4. Why Moore's law is not working
anymore
Power consumption
Wire delays
DRAM access latency
Diminishing returns of more instruction-level
parallelism
5. Power consumption
Sun‟s Surface
10,000
1,000 Rocket Nozzle
Power Density (W/cm2)
100 Nuclear Reactor
10 Pentium® processors
Hot Plate
1
8080
486
386
„70 „80 ‟90 ‟00 „10
Intel Developer Forum, Spring 2004 - Pat Gelsinger
9. The Free Lunch Is Over. A
Fundamental Turn Toward
Concurrency in Software
Herb Sutter
10. Survival
To scale performance, put many processing cores on the
microprocessor chip
New Moore’s law edition is about doubling of cores.
11. Quotations
No matter how fast processors get, software
consistently finds new ways to eat up the extra speed
If you haven‟t done so already, now is the time to take
a hard look at the design of your application,
determine what operations are CPU-sensitive now or
are likely to become so soon, and identify how those
places could benefit from concurrency.”
-- Herb Sutter, C++ Architect at Microsoft (March
2005)
After decades of single core processors, the high
volume processor industry has gone from single to
dual to quad-core in just the last two years. Moore‟s
Law scaling should easily let us hit the 80-core mark
in mainstream processors within the next ten years
and quite possibly even less.
-- Justin Rattner, CTO, Intel (February
2007)
12. What keeps us away from multicore
Sequential way of thinking
Believe that parallel programming is difficult and
error-prone
Unwilling to accept the fact that sequential era is
over
Neglecting performance
13. What have been done
Many frameworks have been created, that brings
parallelism at application level.
Vendors hardly tries to teach programming
community how to write parallel programs
MIT and other education centers did a lot of
researches in this area
16. Multithreaded algorithms
No single architecture of parallel
computer no single and wide
accepted model of parallel
computing
We rely on parallel shared memory
computer
17. Dynamic multithreaded model(DMM)
Allows programmer to operate with “logical
parallelism” without worrying about any issues of
static programming
Two main features are:
Nested parallelism (parent can proceed while
spawned child is computing its result)
Parallel loop (iteration of the loop can execute
concurrently)
18. DMM - advantages
Simple extension of “serial model”. Only 3 new
keywords: parallel, spawn and sync.
Provides theoretically clean way of quantify
parallelism based on notions of “work” and
“span”
Many MT algorithms based on nested parallelism
a naturally follows from divide and conquer
approach
27. Analyzing MT algorithms: Matrix
multiplication
P-Square-Matrix-Multiply:
1. n = a.rows
2. let C be new NxN matrix
3. parallel for i = 1 to n
4. parallel for j = 1 to n
5. Cij = 0
6. for k 1 to n
7. Cij= Cij + Aik * B kj
36. Task Scheduler & Thread pool
3.5 ThreadPool.QueueUserWorkItem
disadvantages:
Zero information about each work item
Fairness FIFO queue maintain
Improvements:
More efficient FIFO queue (ConcurrentQueue)
Enhance the API to get more information from user
Task
Work stealing
Threads injections
Wait completion, handling exceptions, getting computation
result
38. References
The Free Lunch Is Over: A Fundamental Turn
Toward Concurrency in Software
MIT Introduction to algorithms video lectures
Chapter 27 Multithreaded Algorithms from
Introduction to algorithms 3rd edition
CLR 4.0 ThreadPool Improvements: Part 1
Multicore Programming Primer
ThreadPool on Channel 9