SlideShare una empresa de Scribd logo
1 de 50
Descargar para leer sin conexión
1© 2016 The MathWorks, Inc.
Jos Martin
jos.martin@mathworks.com
Senior Engineering Manager – Parallel Computing
Parallel Computing with MATLAB and Simulink
2
Introduction
Ø  Why Parallel Computing
•  Need faster insight to bring competitive products to market quickly
•  Computing infrastructure is broadly available (Multicore Desktops, GPUs, Clusters)
Ø  With MathWorks Parallel Computing Tools
•  Leverage computational power of available hardware
•  Accelerate workflows with minimal to no code changes to your original code
•  Seamlessly scale from your desktop to clusters or on the cloud
•  Save engineering and research time and focus on insight
3
Agenda
Ø  Parallel Computing Paradigm
Ø  Task Parallelism
Ø  Data Parallelism
Ø  Summary
4
Agenda
Ø  Parallel Computing Paradigm
Ø  Task Parallelism
Ø  Data Parallelism
Ø  Summary
5
Parallel Computing Paradigm
Multicore Desktops
Multicore Desktop
Core 5
Core 1 Core 2
Core 6MATLAB Desktop
(client)
Worker Worker
Worker Worker
6
Parallel Computing Paradigm
Cluster Hardware
Cluster of computers
Core 5
Core 1 Core 2
Core 6
MATLAB Desktop
(client)
Core 5
Core 1 Core 2
Core 6
Core 5
Core 1 Core 2
Core 6 Core 5
Core 1 Core 2
Core 6
Worker Worker
Worker Worker
Worker Worker
Worker Worker
Worker Worker Worker Worker
Worker Worker Worker Worker
7
Parallel Computing Paradigm
NVIDIA GPUs
Using NVIDIA GPUs
MATLAB Desktop
(client)
GPU cores
Device Memory
8
Cluster Computing Paradigm
§  Prototype on the desktop
§  Integrate with existing
infrastructure
§  Access directly through
MATLAB
User Desktop Headnode
Compute
Nodes
Parallel Computing Toolbox
MATLAB
MATLAB Distributed Computing Server
9
Parallel-enabled Toolboxes (MATLAB® Product Family)
Enable parallel computing support by setting a flag or preference
Optimization
Parallel estimation of
gradients
Statistics and Machine Learning
Resampling Methods, k-Means
clustering, GPU-enabled functions
Neural Networks
Deep Learning, Neural Network
training and simulation
Image Processing
Batch Image Processor, Block
Processing, GPU-enabled functions
Computer Vision
Parallel-enabled functions
in bag-of-words workflow
Signal Processing and
Communications
GPU-enabled FFT filtering,
cross correlation, BER
simulations
Other Parallel-enabled Toolboxes
10
Parallel-enabled Toolboxes (Simulink® Product Family)
Enable parallel computing support by setting a flag or preference
Simulink Control Design
Frequency response estimation
Simulink/Embedded Coder
Generating and building code
Simulink Design Optimization
Response optimization, sensitivity
analysis, parameter estimation
Communication Systems Toolbox
GPU-based System objects for
Simulation Acceleration
Other Parallel-enabled Toolboxes
11
Agenda
Ø  Parallel Computing Paradigm
Ø  Task Parallelism
Ø  Data Parallelism
Ø  Summary
12
parfor
Definition
Code in a parfor loop is guaranteed by the programmer to be execution order
independent
Why is that important?
We can execute the iterates of the loop in any order, potentially at the same time on
many different workers.
13
parfor – how it works
§  Static analysis to deduce variable classification
–  Works out what data will need to be passed to which iterates
§  A loop from 1:N has N iterates which we partition into a number of intervals
–  Each interval will likely have a different number of iterates
§  Start allocating the intervals to execute on the workers
§  Stitch the results back together
–  Using functions from the static analysis
14
Variable Classification
reduce = 0; bcast = …; in = …;
parfor i = 1:N
temp = foo1(bcast, i);
out(i) = foo2(in(i), temp);
reduce = reduce + foo3(temp);
end
15
Loop variable
reduce = 0; bcast = …; in = …;
parfor i = 1:N
temp = foo1(bcast, i);
out(i) = foo2(in(i), temp);
reduce = reduce + foo3(temp);
end
16
Making extra parallelism
§  No one loop appears to have enough iterations to go parallel effectively
–  Imagine if Na, Nb, and Nc are all reasonably small
for ii = 1:Na
for jj = 1:Nb
for kk = Nc
end
end
end
Na * Nb * Nc == quiteBigNumber
mergeLoopsDemo
17
Making extra parallelism
[N, fun] = mergeLoopRanges([Na Nb Nc]);
parfor xx = 1:N
[ii,jj,kk] = fun(xx);
doOriginalLoopCode
end
18
Sliced Variable
reduce = 0; bcast = …; in = …;
parfor i = 1:N
temp = foo1(bcast, i);
out(i) = foo2(in(i), temp);
reduce = reduce + foo3(temp);
end
19
Broadcast variable
reduce = 0; bcast = …; in = …;
parfor i = 1:N
temp = foo1(bcast, i);
out(i) = foo2(in(i), temp);
reduce = reduce + foo3(temp);
end
20
Reusing data
D = makeSomeBigData;
for ii = 1:N
parfor jj = 1:M
a(jj) = func(D, jj);
end
end
21
Reusing data
D = parallel.pool.Constant( @makeSomeBigData );
for ii = 1:N
parfor jj = 1:M
a(jj) = func(D.value, jj);
end
end
% Alternatively you can send data once and re-use it
oD = parallel.pool.Constant( someLargeData )
22
Common parallel program
set stuff going
while not all finished {
for next available result do something;
}
23
parfeval
§  Allows asynchronous programming
f = parfeval(@func, numOut, in1, in2, …)
§  The return f is a future which allows you to
–  Wait for the completion of calling func(in1, in2, …)
–  Get the result of that call
–  … do other useful parallel programming tasks …
24
Fetch Next
§  Fetch next available unread result from an array of futures.
[idx, out1, ...] = fetchNext(arrayOfFutures)
§  idx is the index of the future from which the result is fetched
§  Once a particular future has returned a result via fetchNext it will never do so again
–  That particular result is considered read, and will not be re-read
25
Common parallel program (MATLAB)
% Set stuff going
for ii = N:-1:1
fs(ii) = parfeval(@stuff, 1);
end
% While not all finished
for ii = 1:N
% For next available result
[whichOne, result] = fetchNext(fs);
doSomething(whichOne, result);
end
parfevalWaitbarDemo
26
parfevalOnAll
§  Frequently you want setup and teardown operations
–  which execute once on each worker in the pool, before and after the actual work
§  Execution order guarantee:
It is guaranteed that relative order of parfeval and parfevalOnAll as executed on the
client will be preserved on all the workers.
27
Agenda
Ø  Parallel Computing Paradigm
Ø  Task Parallelism
Ø  Data Parallelism
Ø  Summary
28
Remote arrays in MATLAB
MATLAB provides array types for data that is not in “normal” memory
distributed array
(since R2006b)
Data lives in the combined memory of a cluster of
computers
gpuArray
(since R2010b)
Data lives in the memory of the GPU card
tall array
(since R2016b)
Data lives on disk, maybe spread across many disks
(distributed file-system)
29
Normal array – calculation happens in main memory:
Remote arrays in MATLAB
Rule: take the calculation to where the data is
x = rand(...)
x_norm = (x – mean(x)) ./ std(x)
30
Remote arrays in MATLAB
gpuArray – all calculation happens on the GPU:
x = gpuArray(...)
x_norm = (x – mean(x)) ./ std(x)
Rule: take the calculation to where the data is
distributed – calculation is spread across the cluster:
x = distributed(...)
x_norm = (x – mean(x)) ./ std(x)
tall – calculation is performed by stepping through files:
x = tall(...)
x_norm = (x – mean(x)) ./ std(x)
31
How big is big?
What does “Big Data” even mean?
“Any collection of data sets so large and complex that it becomes difficult to
process using … traditional data processing applications.”
(Wikipedia)
“Any collection of data sets so large that it becomes difficult to process using
traditional MATLAB functions, which assume all of the data is in memory.”
(MATLAB)
32
How big is big?
The Large Hadron Collider reached peak
performance on 29 June 2016
§  2076 bunches of 120 billion protons currently
circulating in each direction
§  ~1.6x1014 collisions per week, >30 petabytes of
data per year
§  too big to even store in one place
§  used to explore interesting science, but taking
researchers a long time to get through
In 1085 William 1st commissioned a survey
of England
§  ~2 million words and figures collected over two
years
§  too big to handle in one piece
§  collected and summarized in regional pieces
§  used to generate revenue (tax), but most of the
data then sat unused
Image courtesy of CERN.
Copyright 2011 CERN.
33
Tall arrays (new R2016b)
§  MATLAB data-type for data that doesn’t fit into memory
§  Ideal for lots of observations, few variables (hence “tall”)
§  Looks like a normal MATLAB array
–  Supports numeric types, tables, datetimes, categoricals, strings, etc…
–  Basic maths, stats, indexing, etc.
–  Statistics and Machine Learning Toolbox support (clustering, classification, etc.)
34
Cluster of
Machines
Memory
Single
Machine
Memory
Tall arrays (new R2016b)
§  Data is in one or more files
§  Typically tabular data
§  Files stacked vertically
§  Data doesn’t fit into memory
(even cluster memory)
35
Cluster of
Machines
Memory
Single
Machine
Memory
Tall arrays (new R2016b)
§  Use datastore to define files
§  Allows access to small pieces of
data that fit in memory.
Datastore	
  
ds = datastore('*.csv')
while hasdata(ds)
piece = read(ds);
% Process piece
end
36
tall	
  array	
  
Cluster of
Machines
Memory
Single
Machine
Memory
Tall arrays (new R2016b)
§  Create tall table from datastore
§  Operate on whole tall table just like
ordinary table
Datastore	
  
ds = datastore('*.csv')
tt = tall(ds)
summary(tt)
max(tt.EndTime – tt.StartTime)
Single
Machine
MemoryProcess	
  
37
tall	
  array	
  
Cluster of
Machines
Memory
Single
Machine
Memory
Tall arrays (new R2016b)
§  With Parallel Computing Toolbox,
process several pieces at once
Datastore	
  
Single
Machine
MemoryProcess	
  
Single
Machine
MemoryProcess	
  
38
Tall arrays (new R2016b)
Example
New York taxi fares (150,000,000 rows (~25GB) per year)
>>	
  dataLocation	
  =	
  'hdfs://hadoop01glnxa64:54310/datasets/nyctaxi/';	
  
>>	
  ds	
  =	
  datastore(	
  fullfile(dataLocation,	
  'yellow_tripdata_2015-­‐*.csv')	
  );	
  
>>	
  tt	
  =	
  tall(ds)	
  
tt	
  =	
  
	
  
	
  	
  M×19	
  tall	
  table	
  
	
  
	
  	
  	
  	
  VendorID	
  	
  	
  	
  tpep_pickup_datetime	
  	
  	
  	
  	
  tpep_dropoff_datetime	
  	
  	
  	
  passenger_count	
  	
  	
  	
  trip_distance	
  	
  	
  	
  pickup_longitude	
  	
  	
  	
  pic	
  
	
  	
  	
  	
  ________	
  	
  	
  	
  _____________________	
  	
  	
  	
  _____________________	
  	
  	
  	
  _______________	
  	
  	
  	
  _____________	
  	
  	
  	
  ________________	
  	
  	
  	
  ___	
  
	
  
	
  	
  	
  	
  2	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  '2015-­‐01-­‐15	
  19:05:39'	
  	
  	
  	
  '2015-­‐01-­‐15	
  19:23:42'	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1.59	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐73.994	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  40	
  
	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:33:38'	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:53:28'	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  3.3	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐74.002	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  40.	
  
	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:33:38'	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:43:41'	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1.8	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐73.963	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  40.	
  
	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:33:39'	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:35:31'	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.5	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐74.009	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  40.	
  
	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:33:39'	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:52:58'	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  3	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐73.971	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  40.	
  
	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:33:39'	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:53:52'	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  9	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐73.874	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  40.	
  
	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:33:39'	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:58:31'	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  2.2	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐73.983	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  40.	
  
	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:33:39'	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:42:20'	
  	
  	
  	
  3	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.8	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐74.003	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  40.	
  
	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  
	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  
	
  
Input data is tabular –
result is a tall table
39
>>	
  dataLocation	
  =	
  'hdfs://hadoop01glnxa64:54310/datasets/nyctaxi/';	
  
>>	
  ds	
  =	
  datastore(	
  fullfile(dataLocation,	
  'yellow_tripdata_2015-­‐*.csv')	
  );	
  
>>	
  tt	
  =	
  tall(ds)	
  
tt	
  =	
  
	
  
	
  	
  M×19	
  tall	
  table	
  
	
  
	
  	
  	
  	
  VendorID	
  	
  	
  	
  tpep_pickup_datetime	
  	
  	
  	
  	
  tpep_dropoff_datetime	
  	
  	
  	
  passenger_count	
  	
  	
  	
  trip_distance	
  	
  	
  	
  pickup_longitude	
  	
  	
  	
  pic	
  
	
  	
  	
  	
  ________	
  	
  	
  	
  _____________________	
  	
  	
  	
  _____________________	
  	
  	
  	
  _______________	
  	
  	
  	
  _____________	
  	
  	
  	
  ________________	
  	
  	
  	
  ___	
  
	
  
	
  	
  	
  	
  2	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  '2015-­‐01-­‐15	
  19:05:39'	
  	
  	
  	
  '2015-­‐01-­‐15	
  19:23:42'	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1.59	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐73.994	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  40	
  
	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:33:38'	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:53:28'	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  3.3	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐74.002	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  40.	
  
	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:33:38'	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:43:41'	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1.8	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐73.963	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  40.	
  
	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:33:39'	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:35:31'	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.5	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐74.009	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  40.	
  
	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:33:39'	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:52:58'	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  3	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐73.971	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  40.	
  
	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:33:39'	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:53:52'	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  9	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐73.874	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  40.	
  
	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:33:39'	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:58:31'	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  2.2	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐73.983	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  40.	
  
	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:33:39'	
  	
  	
  	
  '2015-­‐01-­‐10	
  20:42:20'	
  	
  	
  	
  3	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.8	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐74.003	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  40.	
  
	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  
	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  
	
  
Tall arrays (new R2016b)
Example
New York taxi fares (150,000,000 rows (~25GB) per year)
Number of rows is
unknown until all the
data has been read
Only the first few
rows are displayed
40
Evaluating	
  tall	
  expression	
  using	
  the	
  Parallel	
  Pool	
  'local':	
  
-­‐	
  Pass	
  1	
  of	
  1:	
  0%	
  complete	
  
Evaluation	
  0%	
  complete	
  
Tall arrays (new R2016b)
Example
Once created, can process much like an ordinary table
%	
  Remove	
  some	
  bad	
  data	
  
tt.trip_minutes	
  =	
  minutes(tt.tpep_dropoff_datetime	
  -­‐	
  tt.tpep_pickup_datetime);	
  
tt.speed_mph	
  =	
  tt.trip_distance	
  ./	
  (tt.trip_minutes	
  ./	
  60);	
  
ignore	
  =	
  tt.trip_minutes	
  <=	
  1	
  |	
  ...	
  	
  	
  	
  %	
  really	
  short	
  
	
  	
  	
  	
  tt.trip_minutes	
  >=	
  60	
  *	
  12	
  |	
  ...	
  	
  	
  %	
  unfeasibly	
  long	
  
	
  	
  	
  	
  tt.trip_distance	
  <=	
  1	
  |	
  ...	
  	
  	
  	
  	
  	
  	
  	
  %	
  really	
  short	
  
	
  	
  	
  	
  tt.trip_distance	
  >=	
  12	
  *	
  55	
  |	
  ...	
  	
  %	
  unfeasibly	
  far	
  
	
  	
  	
  	
  tt.speed_mph	
  >	
  55	
  |	
  ...	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  %	
  unfeasibly	
  fast	
  
	
  	
  	
  	
  tt.total_amount	
  <	
  0	
  |	
  ...	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  %	
  negative	
  fares?!	
  
	
  	
  	
  	
  tt.total_amount	
  >	
  10000;	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  %	
  unfeasibly	
  large	
  fares	
  
tt(ignore,	
  :)	
  =	
  [];	
  
	
  
%	
  Credit	
  card	
  payments	
  have	
  the	
  most	
  accurate	
  tip	
  data	
  
keep	
  =	
  tt.payment_type	
  ==	
  {'Credit	
  card'};	
  
tt	
  =	
  tt(keep,:);	
  
	
  
%	
  Show	
  tip	
  distribution	
  
histogram(	
  tt.tip_amount,	
  0:25	
  )	
  
Evaluating	
  tall	
  expression	
  using	
  the	
  Parallel	
  Pool	
  'local':	
  
-­‐	
  Pass	
  1	
  of	
  1:	
  16%	
  complete	
  
Evaluation	
  14%	
  complete	
  
Evaluating	
  tall	
  expression	
  using	
  the	
  Parallel	
  Pool	
  'local':	
  
-­‐	
  Pass	
  1	
  of	
  1:	
  33%	
  complete	
  
Evaluation	
  30%	
  complete	
  
Evaluating	
  tall	
  expression	
  using	
  the	
  Parallel	
  Pool	
  'local':	
  
-­‐	
  Pass	
  1	
  of	
  1:	
  50%	
  complete	
  
Evaluation	
  45%	
  complete	
  
Evaluating	
  tall	
  expression	
  using	
  the	
  Parallel	
  Pool	
  'local':	
  
-­‐	
  Pass	
  1	
  of	
  1:	
  66%	
  complete	
  
Evaluation	
  63%	
  complete	
  
Evaluating	
  tall	
  expression	
  using	
  the	
  Parallel	
  Pool	
  'local':	
  
-­‐	
  Pass	
  1	
  of	
  1:	
  83%	
  complete	
  
Evaluation	
  75%	
  complete	
  
Evaluating	
  tall	
  expression	
  using	
  the	
  Parallel	
  Pool	
  'local':	
  
-­‐	
  Pass	
  1	
  of	
  1:	
  Completed	
  in	
  4.9667	
  min	
  
Evaluation	
  completed	
  in	
  5	
  min	
  
Data only read once,
despite 21 operations
41
Scaling up
If you just have MATLAB:
§  Run through each ‘chunk’ of data one by one
If you also have Parallel Computing Toolbox:
§  Use all local cores to process several ‘chunks’ at once
If you also have a cluster with MATLAB Distributed
Computing Server (MDCS):
§  Use the whole cluster to process many ‘chunks’ at once
42
Scaling up
Working with clusters from MATLAB desktop:
§  General purpose MATLAB cluster
–  Can co-exist with other MATLAB workloads (parfor, parfeval,
spmd, jobs and tasks, distributed arrays, …)
–  Uses local memory and file caches on workers for efficiency
§  Spark-enabled Hadoop clusters
–  Data in HDFS
–  Calculation is scheduled to be near data
–  Uses Spark’s built-in memory and disk caching
43
Distributed Arrays
§  Unlike tall these need to fit in cluster memory
§  Full range of numerical algorithms
–  Linear Algebra, eigenvalues, etc
§  Sparse support
§  Want to write your own function?
–  Full access to local part of the data
–  Ability to rebuild distributed array from local parts
44
Standard Benchmarks
HPL
spmd
A = rand(10000, codistributor2dbc);
b = rand(10000, 1, codistributor2dbc);
x = Ab;
end
FFT
D = rand(1e8, 1, 'distributed');
F = fft(D);
STREAM Triad
B = rand(1e8, 1, 'distributed');
C = rand(1e8, 1, 'distributed');
q = rand;
A = B + q*C;
Create 2D Block Cyclic
distribution appropriate for
matrix solve
45
Single Program, Multiple Data (spmd)
§  Everyone executes the same program
–  Just with different data
–  Inter-lab communication library enabled (MPI)
–  Call labindex and numlabs to distinguish labs
§  Example
x = 1
spmd
y = x + labindex;
end
46
Variable types can change across spmd
x = 1;
assert( isa(x, 'double') )
spmd
assert( isa(x, 'double') )
y = labindex;
end
assert( isa(y 'Composite') )
y{1} == 1; % TRUE
spmd
assert( isa(y, 'double') )
end
Ordinary types are broadcast to all labs
Returned ordinary type is referenced
by a Composite
Composite can be dereferenced
on the client
Composite becomes the contained
ordinary type on a lab
47
Want to write an MPI Program?
Send / receive data between labs
labSend
labReceive
labSendReceive
labProbe
Who am I? Where am I?
labindex
numlabs
Global Operations across labs
gplus
gcat
gop
labBarrier
labBroadcast
48
RandomAccess – an interesting MPI program
spmd
% Initialize random number stream on each worker
randRA( (labindex-1) * m * 4 / numlabs, 'StreamOffset' );
t1 = tic;
for k = 1:nloops
% Make the local chunk of random data
list = randRA( b );
% Loop over the hyper-cube dimensions
for d = 0:logNumlabs-1
% Choose my partner for this dimension of the hypercube
partner = 1 + bitxor( (labindex-1), 2.^d );
% Choose my mask for this dimension of the hypercube
dim_mask = uint64( 2.^( d + logLocalSize ) );
% Choose which data to send and receive for this dimension
dataToSend = logical( bitand( list, dim_mask ) );
if partner <= labindex
dataToSend = ~dataToSend;
end
% Setup a list of data that will be sent, and list I will keep
send_list = list( dataToSend );
keep_list = list( ~dataToSend );
% Use send/receive to get some data that we should use next round
recv_list = labSendReceive( partner, partner, send_list );
% Our new list is the old list and what we've received
list = [keep_list, recv_list];
end
% Finally, after all rounds of communication, perform the table updates.
idx = 1 + double( bitand( localMask, list ) );
T(idx) = bitxor( T(idx), list );
end
% Calculate max time
t = gop( @max, toc( t1 ) );
end
49
RandomAccess – an interesting MPI program
spmd
t1 = tic;
for k = 1:nloops
% Make the local chunk of random data
list = randRA( b );
% Loop over the hyper-cube dimensions
for d = myDims(0:logNumlabs-1, labindex)
[send_list, keep_list] = partitionData(list, d);
% Use send/receive to transfer data
recv_list = labSendReceive(to(d), from(d), send_list);
% Our new list is the old list and what we've received
list = [keep_list, recv_list];
end
% Finally, perform the table updates.
T = updateTable(list);
end
% Calculate max time
t = gop( @max, toc( t1 ) );
end
50
Summary
§  Simple array types for data analytics and PGAS programming
§  Extensive parallel language suited to all types of problem
§  All runs in both interactive and batch (off-line) mode

Más contenido relacionado

La actualidad más candente

Memory allocation
Memory allocationMemory allocation
Memory allocationsanya6900
 
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...Big Data Spain
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16MLconf
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
Data structure and algorithm
Data structure and algorithmData structure and algorithm
Data structure and algorithmTrupti Agrawal
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
TensorFlow example for AI Ukraine2016
TensorFlow example  for AI Ukraine2016TensorFlow example  for AI Ukraine2016
TensorFlow example for AI Ukraine2016Andrii Babii
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
Common Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksKenta Oono
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureMani Goswami
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operationSubhas Kumar Ghosh
 
Cupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmCupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmTarikuDabala1
 
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowRajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowAI Frontiers
 
Introduction to datastructure and algorithm
Introduction to datastructure and algorithmIntroduction to datastructure and algorithm
Introduction to datastructure and algorithmPratik Mota
 
Dynamic Memory Allocation(DMA)
Dynamic Memory Allocation(DMA)Dynamic Memory Allocation(DMA)
Dynamic Memory Allocation(DMA)Kamal Acharya
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with RAbhirup Mallik
 

La actualidad más candente (20)

Memory allocation
Memory allocationMemory allocation
Memory allocation
 
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Data structure and algorithm
Data structure and algorithmData structure and algorithm
Data structure and algorithm
 
Matrix multiplication
Matrix multiplicationMatrix multiplication
Matrix multiplication
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Prog lang-c
Prog lang-cProg lang-c
Prog lang-c
 
TensorFlow example for AI Ukraine2016
TensorFlow example  for AI Ukraine2016TensorFlow example  for AI Ukraine2016
TensorFlow example for AI Ukraine2016
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Common Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning Frameworks
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
 
Cupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmCupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithm
 
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowRajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
 
Introduction to datastructure and algorithm
Introduction to datastructure and algorithmIntroduction to datastructure and algorithm
Introduction to datastructure and algorithm
 
Survey onhpcs languages
Survey onhpcs languagesSurvey onhpcs languages
Survey onhpcs languages
 
Parallel searching
Parallel searchingParallel searching
Parallel searching
 
Dynamic Memory Allocation(DMA)
Dynamic Memory Allocation(DMA)Dynamic Memory Allocation(DMA)
Dynamic Memory Allocation(DMA)
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
 

Similar a Data Analytics and Simulation in Parallel with MATLAB*

NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeRizwan Habib
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine ParallelismSri Prasanna
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopApache Apex
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable PythonTravis Oliphant
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA Japan
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkDatio Big Data
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Sparksamthemonad
 
Spark Meetup TensorFrames
Spark Meetup TensorFramesSpark Meetup TensorFrames
Spark Meetup TensorFramesJen Aman
 
Spark Meetup TensorFrames
Spark Meetup TensorFramesSpark Meetup TensorFrames
Spark Meetup TensorFramesJen Aman
 

Similar a Data Analytics and Simulation in Parallel with MATLAB* (20)

NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
main
mainmain
main
 
Handout3o
Handout3oHandout3o
Handout3o
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Parallel computation
Parallel computationParallel computation
Parallel computation
 
Eedc.apache.pig last
Eedc.apache.pig lastEedc.apache.pig last
Eedc.apache.pig last
 
SLE2015: Distributed ATL
SLE2015: Distributed ATLSLE2015: Distributed ATL
SLE2015: Distributed ATL
 
Interpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with SawzallInterpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with Sawzall
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 
Spark Meetup TensorFrames
Spark Meetup TensorFramesSpark Meetup TensorFrames
Spark Meetup TensorFrames
 
Spark Meetup TensorFrames
Spark Meetup TensorFramesSpark Meetup TensorFrames
Spark Meetup TensorFrames
 

Más de Intel® Software

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology Intel® Software
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaIntel® Software
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciIntel® Software
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.Intel® Software
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Intel® Software
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Intel® Software
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Intel® Software
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchIntel® Software
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel® Software
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019Intel® Software
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019Intel® Software
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Intel® Software
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Intel® Software
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Intel® Software
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...Intel® Software
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesIntel® Software
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision SlidesIntel® Software
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Intel® Software
 

Más de Intel® Software (20)

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
 
AIDC India - AI on IA
AIDC India  - AI on IAAIDC India  - AI on IA
AIDC India - AI on IA
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
 

Último

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Data Analytics and Simulation in Parallel with MATLAB*

  • 1. 1© 2016 The MathWorks, Inc. Jos Martin jos.martin@mathworks.com Senior Engineering Manager – Parallel Computing Parallel Computing with MATLAB and Simulink
  • 2. 2 Introduction Ø  Why Parallel Computing •  Need faster insight to bring competitive products to market quickly •  Computing infrastructure is broadly available (Multicore Desktops, GPUs, Clusters) Ø  With MathWorks Parallel Computing Tools •  Leverage computational power of available hardware •  Accelerate workflows with minimal to no code changes to your original code •  Seamlessly scale from your desktop to clusters or on the cloud •  Save engineering and research time and focus on insight
  • 3. 3 Agenda Ø  Parallel Computing Paradigm Ø  Task Parallelism Ø  Data Parallelism Ø  Summary
  • 4. 4 Agenda Ø  Parallel Computing Paradigm Ø  Task Parallelism Ø  Data Parallelism Ø  Summary
  • 5. 5 Parallel Computing Paradigm Multicore Desktops Multicore Desktop Core 5 Core 1 Core 2 Core 6MATLAB Desktop (client) Worker Worker Worker Worker
  • 6. 6 Parallel Computing Paradigm Cluster Hardware Cluster of computers Core 5 Core 1 Core 2 Core 6 MATLAB Desktop (client) Core 5 Core 1 Core 2 Core 6 Core 5 Core 1 Core 2 Core 6 Core 5 Core 1 Core 2 Core 6 Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker
  • 7. 7 Parallel Computing Paradigm NVIDIA GPUs Using NVIDIA GPUs MATLAB Desktop (client) GPU cores Device Memory
  • 8. 8 Cluster Computing Paradigm §  Prototype on the desktop §  Integrate with existing infrastructure §  Access directly through MATLAB User Desktop Headnode Compute Nodes Parallel Computing Toolbox MATLAB MATLAB Distributed Computing Server
  • 9. 9 Parallel-enabled Toolboxes (MATLAB® Product Family) Enable parallel computing support by setting a flag or preference Optimization Parallel estimation of gradients Statistics and Machine Learning Resampling Methods, k-Means clustering, GPU-enabled functions Neural Networks Deep Learning, Neural Network training and simulation Image Processing Batch Image Processor, Block Processing, GPU-enabled functions Computer Vision Parallel-enabled functions in bag-of-words workflow Signal Processing and Communications GPU-enabled FFT filtering, cross correlation, BER simulations Other Parallel-enabled Toolboxes
  • 10. 10 Parallel-enabled Toolboxes (Simulink® Product Family) Enable parallel computing support by setting a flag or preference Simulink Control Design Frequency response estimation Simulink/Embedded Coder Generating and building code Simulink Design Optimization Response optimization, sensitivity analysis, parameter estimation Communication Systems Toolbox GPU-based System objects for Simulation Acceleration Other Parallel-enabled Toolboxes
  • 11. 11 Agenda Ø  Parallel Computing Paradigm Ø  Task Parallelism Ø  Data Parallelism Ø  Summary
  • 12. 12 parfor Definition Code in a parfor loop is guaranteed by the programmer to be execution order independent Why is that important? We can execute the iterates of the loop in any order, potentially at the same time on many different workers.
  • 13. 13 parfor – how it works §  Static analysis to deduce variable classification –  Works out what data will need to be passed to which iterates §  A loop from 1:N has N iterates which we partition into a number of intervals –  Each interval will likely have a different number of iterates §  Start allocating the intervals to execute on the workers §  Stitch the results back together –  Using functions from the static analysis
  • 14. 14 Variable Classification reduce = 0; bcast = …; in = …; parfor i = 1:N temp = foo1(bcast, i); out(i) = foo2(in(i), temp); reduce = reduce + foo3(temp); end
  • 15. 15 Loop variable reduce = 0; bcast = …; in = …; parfor i = 1:N temp = foo1(bcast, i); out(i) = foo2(in(i), temp); reduce = reduce + foo3(temp); end
  • 16. 16 Making extra parallelism §  No one loop appears to have enough iterations to go parallel effectively –  Imagine if Na, Nb, and Nc are all reasonably small for ii = 1:Na for jj = 1:Nb for kk = Nc end end end Na * Nb * Nc == quiteBigNumber mergeLoopsDemo
  • 17. 17 Making extra parallelism [N, fun] = mergeLoopRanges([Na Nb Nc]); parfor xx = 1:N [ii,jj,kk] = fun(xx); doOriginalLoopCode end
  • 18. 18 Sliced Variable reduce = 0; bcast = …; in = …; parfor i = 1:N temp = foo1(bcast, i); out(i) = foo2(in(i), temp); reduce = reduce + foo3(temp); end
  • 19. 19 Broadcast variable reduce = 0; bcast = …; in = …; parfor i = 1:N temp = foo1(bcast, i); out(i) = foo2(in(i), temp); reduce = reduce + foo3(temp); end
  • 20. 20 Reusing data D = makeSomeBigData; for ii = 1:N parfor jj = 1:M a(jj) = func(D, jj); end end
  • 21. 21 Reusing data D = parallel.pool.Constant( @makeSomeBigData ); for ii = 1:N parfor jj = 1:M a(jj) = func(D.value, jj); end end % Alternatively you can send data once and re-use it oD = parallel.pool.Constant( someLargeData )
  • 22. 22 Common parallel program set stuff going while not all finished { for next available result do something; }
  • 23. 23 parfeval §  Allows asynchronous programming f = parfeval(@func, numOut, in1, in2, …) §  The return f is a future which allows you to –  Wait for the completion of calling func(in1, in2, …) –  Get the result of that call –  … do other useful parallel programming tasks …
  • 24. 24 Fetch Next §  Fetch next available unread result from an array of futures. [idx, out1, ...] = fetchNext(arrayOfFutures) §  idx is the index of the future from which the result is fetched §  Once a particular future has returned a result via fetchNext it will never do so again –  That particular result is considered read, and will not be re-read
  • 25. 25 Common parallel program (MATLAB) % Set stuff going for ii = N:-1:1 fs(ii) = parfeval(@stuff, 1); end % While not all finished for ii = 1:N % For next available result [whichOne, result] = fetchNext(fs); doSomething(whichOne, result); end parfevalWaitbarDemo
  • 26. 26 parfevalOnAll §  Frequently you want setup and teardown operations –  which execute once on each worker in the pool, before and after the actual work §  Execution order guarantee: It is guaranteed that relative order of parfeval and parfevalOnAll as executed on the client will be preserved on all the workers.
  • 27. 27 Agenda Ø  Parallel Computing Paradigm Ø  Task Parallelism Ø  Data Parallelism Ø  Summary
  • 28. 28 Remote arrays in MATLAB MATLAB provides array types for data that is not in “normal” memory distributed array (since R2006b) Data lives in the combined memory of a cluster of computers gpuArray (since R2010b) Data lives in the memory of the GPU card tall array (since R2016b) Data lives on disk, maybe spread across many disks (distributed file-system)
  • 29. 29 Normal array – calculation happens in main memory: Remote arrays in MATLAB Rule: take the calculation to where the data is x = rand(...) x_norm = (x – mean(x)) ./ std(x)
  • 30. 30 Remote arrays in MATLAB gpuArray – all calculation happens on the GPU: x = gpuArray(...) x_norm = (x – mean(x)) ./ std(x) Rule: take the calculation to where the data is distributed – calculation is spread across the cluster: x = distributed(...) x_norm = (x – mean(x)) ./ std(x) tall – calculation is performed by stepping through files: x = tall(...) x_norm = (x – mean(x)) ./ std(x)
  • 31. 31 How big is big? What does “Big Data” even mean? “Any collection of data sets so large and complex that it becomes difficult to process using … traditional data processing applications.” (Wikipedia) “Any collection of data sets so large that it becomes difficult to process using traditional MATLAB functions, which assume all of the data is in memory.” (MATLAB)
  • 32. 32 How big is big? The Large Hadron Collider reached peak performance on 29 June 2016 §  2076 bunches of 120 billion protons currently circulating in each direction §  ~1.6x1014 collisions per week, >30 petabytes of data per year §  too big to even store in one place §  used to explore interesting science, but taking researchers a long time to get through In 1085 William 1st commissioned a survey of England §  ~2 million words and figures collected over two years §  too big to handle in one piece §  collected and summarized in regional pieces §  used to generate revenue (tax), but most of the data then sat unused Image courtesy of CERN. Copyright 2011 CERN.
  • 33. 33 Tall arrays (new R2016b) §  MATLAB data-type for data that doesn’t fit into memory §  Ideal for lots of observations, few variables (hence “tall”) §  Looks like a normal MATLAB array –  Supports numeric types, tables, datetimes, categoricals, strings, etc… –  Basic maths, stats, indexing, etc. –  Statistics and Machine Learning Toolbox support (clustering, classification, etc.)
  • 34. 34 Cluster of Machines Memory Single Machine Memory Tall arrays (new R2016b) §  Data is in one or more files §  Typically tabular data §  Files stacked vertically §  Data doesn’t fit into memory (even cluster memory)
  • 35. 35 Cluster of Machines Memory Single Machine Memory Tall arrays (new R2016b) §  Use datastore to define files §  Allows access to small pieces of data that fit in memory. Datastore   ds = datastore('*.csv') while hasdata(ds) piece = read(ds); % Process piece end
  • 36. 36 tall  array   Cluster of Machines Memory Single Machine Memory Tall arrays (new R2016b) §  Create tall table from datastore §  Operate on whole tall table just like ordinary table Datastore   ds = datastore('*.csv') tt = tall(ds) summary(tt) max(tt.EndTime – tt.StartTime) Single Machine MemoryProcess  
  • 37. 37 tall  array   Cluster of Machines Memory Single Machine Memory Tall arrays (new R2016b) §  With Parallel Computing Toolbox, process several pieces at once Datastore   Single Machine MemoryProcess   Single Machine MemoryProcess  
  • 38. 38 Tall arrays (new R2016b) Example New York taxi fares (150,000,000 rows (~25GB) per year) >>  dataLocation  =  'hdfs://hadoop01glnxa64:54310/datasets/nyctaxi/';   >>  ds  =  datastore(  fullfile(dataLocation,  'yellow_tripdata_2015-­‐*.csv')  );   >>  tt  =  tall(ds)   tt  =        M×19  tall  table            VendorID        tpep_pickup_datetime          tpep_dropoff_datetime        passenger_count        trip_distance        pickup_longitude        pic          ________        _____________________        _____________________        _______________        _____________        ________________        ___            2                      '2015-­‐01-­‐15  19:05:39'        '2015-­‐01-­‐15  19:23:42'        1                                    1.59                          -­‐73.994                            40          1                      '2015-­‐01-­‐10  20:33:38'        '2015-­‐01-­‐10  20:53:28'        1                                      3.3                          -­‐74.002                          40.          1                      '2015-­‐01-­‐10  20:33:38'        '2015-­‐01-­‐10  20:43:41'        1                                      1.8                          -­‐73.963                          40.          1                      '2015-­‐01-­‐10  20:33:39'        '2015-­‐01-­‐10  20:35:31'        1                                      0.5                          -­‐74.009                          40.          1                      '2015-­‐01-­‐10  20:33:39'        '2015-­‐01-­‐10  20:52:58'        1                                          3                          -­‐73.971                          40.          1                      '2015-­‐01-­‐10  20:33:39'        '2015-­‐01-­‐10  20:53:52'        1                                          9                          -­‐73.874                          40.          1                      '2015-­‐01-­‐10  20:33:39'        '2015-­‐01-­‐10  20:58:31'        1                                      2.2                          -­‐73.983                          40.          1                      '2015-­‐01-­‐10  20:33:39'        '2015-­‐01-­‐10  20:42:20'        3                                      0.8                          -­‐74.003                          40.          :                      :                                                :                                                :                                    :                                :                                      :          :                      :                                                :                                                :                                    :                                :                                      :     Input data is tabular – result is a tall table
  • 39. 39 >>  dataLocation  =  'hdfs://hadoop01glnxa64:54310/datasets/nyctaxi/';   >>  ds  =  datastore(  fullfile(dataLocation,  'yellow_tripdata_2015-­‐*.csv')  );   >>  tt  =  tall(ds)   tt  =        M×19  tall  table            VendorID        tpep_pickup_datetime          tpep_dropoff_datetime        passenger_count        trip_distance        pickup_longitude        pic          ________        _____________________        _____________________        _______________        _____________        ________________        ___            2                      '2015-­‐01-­‐15  19:05:39'        '2015-­‐01-­‐15  19:23:42'        1                                    1.59                          -­‐73.994                            40          1                      '2015-­‐01-­‐10  20:33:38'        '2015-­‐01-­‐10  20:53:28'        1                                      3.3                          -­‐74.002                          40.          1                      '2015-­‐01-­‐10  20:33:38'        '2015-­‐01-­‐10  20:43:41'        1                                      1.8                          -­‐73.963                          40.          1                      '2015-­‐01-­‐10  20:33:39'        '2015-­‐01-­‐10  20:35:31'        1                                      0.5                          -­‐74.009                          40.          1                      '2015-­‐01-­‐10  20:33:39'        '2015-­‐01-­‐10  20:52:58'        1                                          3                          -­‐73.971                          40.          1                      '2015-­‐01-­‐10  20:33:39'        '2015-­‐01-­‐10  20:53:52'        1                                          9                          -­‐73.874                          40.          1                      '2015-­‐01-­‐10  20:33:39'        '2015-­‐01-­‐10  20:58:31'        1                                      2.2                          -­‐73.983                          40.          1                      '2015-­‐01-­‐10  20:33:39'        '2015-­‐01-­‐10  20:42:20'        3                                      0.8                          -­‐74.003                          40.          :                      :                                                :                                                :                                    :                                :                                      :          :                      :                                                :                                                :                                    :                                :                                      :     Tall arrays (new R2016b) Example New York taxi fares (150,000,000 rows (~25GB) per year) Number of rows is unknown until all the data has been read Only the first few rows are displayed
  • 40. 40 Evaluating  tall  expression  using  the  Parallel  Pool  'local':   -­‐  Pass  1  of  1:  0%  complete   Evaluation  0%  complete   Tall arrays (new R2016b) Example Once created, can process much like an ordinary table %  Remove  some  bad  data   tt.trip_minutes  =  minutes(tt.tpep_dropoff_datetime  -­‐  tt.tpep_pickup_datetime);   tt.speed_mph  =  tt.trip_distance  ./  (tt.trip_minutes  ./  60);   ignore  =  tt.trip_minutes  <=  1  |  ...        %  really  short          tt.trip_minutes  >=  60  *  12  |  ...      %  unfeasibly  long          tt.trip_distance  <=  1  |  ...                %  really  short          tt.trip_distance  >=  12  *  55  |  ...    %  unfeasibly  far          tt.speed_mph  >  55  |  ...                        %  unfeasibly  fast          tt.total_amount  <  0  |  ...                    %  negative  fares?!          tt.total_amount  >  10000;                      %  unfeasibly  large  fares   tt(ignore,  :)  =  [];     %  Credit  card  payments  have  the  most  accurate  tip  data   keep  =  tt.payment_type  ==  {'Credit  card'};   tt  =  tt(keep,:);     %  Show  tip  distribution   histogram(  tt.tip_amount,  0:25  )   Evaluating  tall  expression  using  the  Parallel  Pool  'local':   -­‐  Pass  1  of  1:  16%  complete   Evaluation  14%  complete   Evaluating  tall  expression  using  the  Parallel  Pool  'local':   -­‐  Pass  1  of  1:  33%  complete   Evaluation  30%  complete   Evaluating  tall  expression  using  the  Parallel  Pool  'local':   -­‐  Pass  1  of  1:  50%  complete   Evaluation  45%  complete   Evaluating  tall  expression  using  the  Parallel  Pool  'local':   -­‐  Pass  1  of  1:  66%  complete   Evaluation  63%  complete   Evaluating  tall  expression  using  the  Parallel  Pool  'local':   -­‐  Pass  1  of  1:  83%  complete   Evaluation  75%  complete   Evaluating  tall  expression  using  the  Parallel  Pool  'local':   -­‐  Pass  1  of  1:  Completed  in  4.9667  min   Evaluation  completed  in  5  min   Data only read once, despite 21 operations
  • 41. 41 Scaling up If you just have MATLAB: §  Run through each ‘chunk’ of data one by one If you also have Parallel Computing Toolbox: §  Use all local cores to process several ‘chunks’ at once If you also have a cluster with MATLAB Distributed Computing Server (MDCS): §  Use the whole cluster to process many ‘chunks’ at once
  • 42. 42 Scaling up Working with clusters from MATLAB desktop: §  General purpose MATLAB cluster –  Can co-exist with other MATLAB workloads (parfor, parfeval, spmd, jobs and tasks, distributed arrays, …) –  Uses local memory and file caches on workers for efficiency §  Spark-enabled Hadoop clusters –  Data in HDFS –  Calculation is scheduled to be near data –  Uses Spark’s built-in memory and disk caching
  • 43. 43 Distributed Arrays §  Unlike tall these need to fit in cluster memory §  Full range of numerical algorithms –  Linear Algebra, eigenvalues, etc §  Sparse support §  Want to write your own function? –  Full access to local part of the data –  Ability to rebuild distributed array from local parts
  • 44. 44 Standard Benchmarks HPL spmd A = rand(10000, codistributor2dbc); b = rand(10000, 1, codistributor2dbc); x = Ab; end FFT D = rand(1e8, 1, 'distributed'); F = fft(D); STREAM Triad B = rand(1e8, 1, 'distributed'); C = rand(1e8, 1, 'distributed'); q = rand; A = B + q*C; Create 2D Block Cyclic distribution appropriate for matrix solve
  • 45. 45 Single Program, Multiple Data (spmd) §  Everyone executes the same program –  Just with different data –  Inter-lab communication library enabled (MPI) –  Call labindex and numlabs to distinguish labs §  Example x = 1 spmd y = x + labindex; end
  • 46. 46 Variable types can change across spmd x = 1; assert( isa(x, 'double') ) spmd assert( isa(x, 'double') ) y = labindex; end assert( isa(y 'Composite') ) y{1} == 1; % TRUE spmd assert( isa(y, 'double') ) end Ordinary types are broadcast to all labs Returned ordinary type is referenced by a Composite Composite can be dereferenced on the client Composite becomes the contained ordinary type on a lab
  • 47. 47 Want to write an MPI Program? Send / receive data between labs labSend labReceive labSendReceive labProbe Who am I? Where am I? labindex numlabs Global Operations across labs gplus gcat gop labBarrier labBroadcast
  • 48. 48 RandomAccess – an interesting MPI program spmd % Initialize random number stream on each worker randRA( (labindex-1) * m * 4 / numlabs, 'StreamOffset' ); t1 = tic; for k = 1:nloops % Make the local chunk of random data list = randRA( b ); % Loop over the hyper-cube dimensions for d = 0:logNumlabs-1 % Choose my partner for this dimension of the hypercube partner = 1 + bitxor( (labindex-1), 2.^d ); % Choose my mask for this dimension of the hypercube dim_mask = uint64( 2.^( d + logLocalSize ) ); % Choose which data to send and receive for this dimension dataToSend = logical( bitand( list, dim_mask ) ); if partner <= labindex dataToSend = ~dataToSend; end % Setup a list of data that will be sent, and list I will keep send_list = list( dataToSend ); keep_list = list( ~dataToSend ); % Use send/receive to get some data that we should use next round recv_list = labSendReceive( partner, partner, send_list ); % Our new list is the old list and what we've received list = [keep_list, recv_list]; end % Finally, after all rounds of communication, perform the table updates. idx = 1 + double( bitand( localMask, list ) ); T(idx) = bitxor( T(idx), list ); end % Calculate max time t = gop( @max, toc( t1 ) ); end
  • 49. 49 RandomAccess – an interesting MPI program spmd t1 = tic; for k = 1:nloops % Make the local chunk of random data list = randRA( b ); % Loop over the hyper-cube dimensions for d = myDims(0:logNumlabs-1, labindex) [send_list, keep_list] = partitionData(list, d); % Use send/receive to transfer data recv_list = labSendReceive(to(d), from(d), send_list); % Our new list is the old list and what we've received list = [keep_list, recv_list]; end % Finally, perform the table updates. T = updateTable(list); end % Calculate max time t = gop( @max, toc( t1 ) ); end
  • 50. 50 Summary §  Simple array types for data analytics and PGAS programming §  Extensive parallel language suited to all types of problem §  All runs in both interactive and batch (off-line) mode