2. History of parallelization
Definition: a form of computation in which many
calculations are carried out simultaneously,
operating on the principle that large problems
can be divided into smaller ones, which are then
solved concurrently (“in parallel”)
Developers have tried to improve performance
by parallelizing problems, even before true
multicore systems
How is this different from multithreading?
Multithreading is a type of parallelism
3. Real-life parallelization
Consider that we have some eggs to boil (data
to process)
Before the early 2000s we only had 1 pot (core)
and more eggs than we could boil at once
(meaning we could boil >1 at a time in
parallel, since you can fit >1 egg/pot). After the
early 2000s we had 2 pots, thus could boil twice
as many eggs at once.
4. Pots vs Boil time
Given: We have 10 eggs to boil and each egg requires 8
minutes in order to be ready to eat. Each pot holds up to
5 eggs.
Number of pots Boil Time
1 16 min
2 8 min
3 8 min
4 8 min
Something interesting occurs… Adding more than 2 pots
does nothing to decrease the overall time.
6. What does this mean?
It really doesn’t matter how many cores we use.
This problem simply will not speed up by adding
cores.
Our equations are pretty simple:
Pots Needed = Number of Eggs/Eggs Per Pot
20 = 100/5
Time = Ceiling(Pots Needed/Pots) x Boiling Time
160 = Celing(20/1) x 8
In Computer Terms:
ExecutionTime = Ceiling(Amount of
threads/Cores)xThreadExecutionTime
7. Caveat
In the egg example we assume…
Thread execution time is constant (never happens)
Presume each core executes one thread at a time
and does not continue w/ the next thread in the
queue until it’s finished – ie given a quad core
processor, it can execute 4 threads and give us the
same result as the egg boiling w/ 4 pots
8. Short attention span LINQ
Warning: I only really know basic LINQ (slowly
integrating it into the Real Feeds Project where I
can use it)
LINQ = Language Integrated Query
Something something query – gotcha. Looks like
SQL in reverse (we know SQL, right?)
Layman's terms – LINQ works against collections
of data (any data really that has an enumerator)
to get a all or subset of data
9. Simple LINQ
var ages = new List<int>(){ 25, 21, 18, 65};
var agesInOrder = from age in ages
orderby age ascending
select age;
11. Ex 2
Take a collection of egg boil times
Iterate over the collection and look at 5 items at
a time
Find the longest cooking time for the egg in the
current patch
Simulate the boiling time with Thread.Sleep
Stop looking for eggs when there are <5 eggs in
the current batch
12. Ex 2 (cont)
First run – ~1600ms
2nd run - ~1200ms why?
Put 5 eggs in the pot
After 4 min, remove 2nd from last egg
After 8 min, remove remaining egg
Add next batch that contains 1 egg
After 4min remove the egg from the pot
13. LINQ to the rescue
Any for / foreach look can potentially be
converted into LINQ
Compare Boil() code v1 & v2
Note :
Optimized (1600ms vs 1200ms)
Nicer to read
15. Parallel Extensions
Introduced in .Net 4.0
Has 2 important methods that we’ll focus on
Parallel.For
Parallel.For(0, eggs.Length, I => {});
Parallel.ForEach
16. Parallel LINQ
Say we have a list of web requests that we need
to do
Each call takes a certain amount of time & we
want to parallelize it
In previous examples we’ve relied on an index,
but say if we can’t