If the data cannot come to the algorithm...

If the Data Cannot Come to
the Algorithm...
many cores with java
session four
data locality
copyright 2013 Robert Burrell Donkin robertburrelldonkin.name
this work is licensed under a Creative Commons Attribution 3.0 Unported License

Pre-emptive multi-tasking operating
systems use involuntary context switching
to provide the illusion of parallel processes
even when the hardware supports only a
single thread of execution.
Take Away from Session One

Even on a single core,
there's no escaping parallelism.
Take Away from Session Two

Take Away from Session Three
Code executing on different cores uses copies held
in registers and caches, so memory shared is likely
to be incoherent unless the program plays by the
rules of the software platform.

Gustafson's Law
S(p) = p - a (p-1)
● S(p) is the speedup for pprocessors
● a is the non-parallelizable fraction
"in practice, the problem size scales with the number of
processors" John L. Gustafson

● Think about Gustafson's Law...
● The quantity of data processed...
● ...scales linearly as processors added.
● Throwing processors at the problem
works...
● ...at least sometimes.
Scales and Scaling

Divide and Conquer
● Back to the future
● Partition the data...
○ ...apply the same algorithm to each part and then
○ ...collate the answers.
● Natural to parallelise
● No contended shared memory

Data Locality
● When the algorithm is small
○ it's more efficient
■ to bring the algorithm to the data
■ than the data to the algorithm
● Whether the data is in
○ caches on cores in a many core computer, or in
○ disc storage in a distributed data store

Map and Reduce
● Partition the data
● The map algorithm
○ works in parallel
○ on local data
○ independently
● The reduce algorithm
○ collates output from map algorithms
● More complex systems built from these blocks

Map-Reduce
As a Query Language
● NoSQL
● A popular alternative to SQL
○ for distributed data stores
● Why...?
○ Easy to
■ read and write
■ parallelize
○ Rich and full programming model

Map-Reduce
Crunching Big Data
● Commodity hardware
● Scales up to Terabyte and Petabyte
○ smoothly by adding new nodes
● Map-Reduce platforms typically provide
○ fault tolerance eg. retry
○ orchestration
○ redundant data storage
● Statistical resilience

Take Away
When you want to be able to process big data
tomorrow by adding cores or computers, adopt
an appropriate architecture today.

If the data cannot come to the algorithm...

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (11)

Similar a If the data cannot come to the algorithm...

Similar a If the data cannot come to the algorithm... (20)

Más de Robert Burrell Donkin

Más de Robert Burrell Donkin (11)

Último

Último (20)

If the data cannot come to the algorithm...