When trying to make auto-tuning practical using common infrastructure, public repository of knowledge, and machine
learning (cTuning.org), we faced a major problem with reproducibility of experimental results collected from multiple users. This was largely due to a lack of information about all software and hardware dependencies as well as a large variation of measured characteristics.
I will present a possible collaborative approach to solve aboveproblems using a new Collective Mind knowledge management system. This modular infrastructure is intended to preserve and share through Internet the whole experimental setups with all related artifacts and their software and hardware dependencies besides just performance data. Researchers can take advantage of shared components and data with extensible meta-description at http://c-mind.org/repo to quickly prototype and validate research techniques particularly on software and hardware optimization and co-design. At the same time, behavior anomalies or model mispredictions can be exposed in a reproducible way to interdisciplinary community for further analysis and improvement. This approach supports our new open publication model in computer engineering where all results and artifacts are continuously shared and validated by the community (c-mind.org/events/trust2014).
This presentations supports our recent publication:
* http://iospress.metapress.com/content/f255p63828m8l384
* http://hal.inria.fr/hal-01054763
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
Â
Collective Mind: bringing reproducible research to the masses
1. Collective Mind:
bringing reproducible research to the masses
Grigori Fursin
POSTALE, INRIA Saclay, France
INRIA-Illinois-ANL Joint Laboratory Workshop
France, June 2014
2. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 2
Challenge:
How to design next generation of faster, smaller, cheaper, more power
efficient and reliable computer systems (software and hardware)?
Long term interdisciplinary vision:
⢠Share code and data in a reproducible way along with publications
⢠Use big data analytics to program optimization, run-time adaptation and
architecture co-design
⢠Bring interdisciplinary community together to validate experimental
results, ensure reproducibility, improve optimization predictions
Message
Continuously validated in industrial
projects with Intel, ARM, IBM, CAPS,
ARC (Synopsys), STMicroelectronics
3. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 3
⢠Motivation: general problems in computer engineering
⢠cTuning: big-data driven program optimization and architecture
co-design and encountered problems
⢠Collective Mind: collaborative and reproducible research and
experimentation in computer engineering
⢠Reproducibility as a side effect
⢠Conclusions and future work
Talk outline
4. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 4
Available solutions
Result
Application
Compilers
Binary and libraries
Architecture
Run-time environment
State of the system Data set
Algorithm
End
User
task
End users require faster, smaller and more power efficient systems
Storage
5. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 5
GCC optimizations
Result
End
User
task
Delivering optimal solution is non-trivial
Fundamental problems:
1) Too many design and optimization choices
at all levels
2) Always multi-objective optimization:
performance vs compilation time vs code
size vs system size vs power consumption vs
reliability vs return on investment
3) Complex relationship and interactions
between ALL software and hardware
components
Empirical auto-tuning is too time consuming,
ad-hoc and tedious to be a mainstream!
6. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 6
Combine auto-tuning with machine learning and crowdsourcing
Plugin-based
MILEPOST GCC
Plugins
Monitor and explore
optimization space
Extract semantic
program features
cTuning.org: plugin-based
auto-tuning framework
and public repository
Program or
kernel1
Program or
kernel N
âŚ
Training
Unseen
program
Prediction
MILEPOST GCC
Plugins
Collect dynamic features
Cluster
Build predictive model
Extract semantic
program features
Collect hardware counters
Predict optimization
to minimize
execution time,
power consumption,
code size, etc
⢠G. Fursin et.al. MILEPOST GCC: Machine learning based self-tuning compiler. 2008, 2011
â˘G. Fursin and O. Temam. Collective optimization: A practical collaborative approach. 2010
â˘G. Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, 2009
⢠F. Agakov et.al.. Using Machine Learning to Focus Iterative Optimization, 2006
7. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 7
⢠G. Fursin et.al. MILEPOST GCC: Machine learning based self-tuning compiler. 2008, 2011
â˘G. Fursin and O. Temam. Collective optimization: A practical collaborative approach. 2010
â˘G. Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems, 2009
⢠F. Agakov et.al.. Using Machine Learning to Focus Iterative Optimization, 2006
Plugin-based
MILEPOST GCC
Plugins
Monitor and explore
optimization space
Extract semantic
program features
cTuning.org: plugin-based
auto-tuning framework
and public repository
Program or
kernel1
Program or
kernel N
âŚ
Training
Unseen
program
Prediction
MILEPOST GCC
Plugins
Collect dynamic features
Cluster
Build predictive model
Extract semantic
program features
Collect hardware counters
Predict optimization
to minimize
execution time,
power consumption,
code size, etc
In 2009, we opened public repository of knowledge
(cTuning.org) and managed to automatically tune
customer benchmarks and compiler heuristics
for a range of real platforms
from IBM and ARC (Synopsis)
Now becomes a mainstream -
everything is solved?
Combine auto-tuning with machine learning and crowdsourcing
8. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 8
Technological chaos
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.0
Phoenix
MVS 2013
XLC
Open64
Jikes
Testarossa
OpenMP MPI
HMPP
OpenCL
CUDA 4.x
gprofprof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
predictive
scheduling
algorithm-
level TBB
MKL
ATLAS
program-
level
function-
level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO threads
process
pass
reordering
KNN
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB ISA
memory size
ARM v6
threads
execution time
reliability
GCC 4.8.x
LLVM 3.4
SVM
genetic
algorithms
We also experienced a few problems
ARM v8
Intel SandyBridge
SSE4
AVX
⢠Difficulty to reproduce results collected
from multiple users (including variability
of performance data and constant
changes in the system)
⢠Difficulty to reproduce and validate
already existing and related techniques
from existing publications (no full specs
and dependencies)
⢠Lack of common, large and diverse
benchmarks and data sets
⢠Difficult to expose choices and extract
features(tools are not prepared for auto-
tuning and machine learning)
⢠Difficult to experiment
CUDA 5.x
SimpleScalar
algorithm precision
9. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 9
Technological chaos
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.0
Phoenix
MVS 2013
XLC
Open64
Jikes
Testarossa
OpenMP MPI
HMPP
OpenCL
CUDA 4.x
gprofprof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
predictive
scheduling
algorithm-
level TBB
MKL
ATLAS
program-
level
function-
level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO threads
process
pass
reordering
KNN
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB ISA
memory size
ARM v6
threads
execution time
reliability
GCC 4.8.x
LLVM 3.4
SVM
genetic
algorithms
We also experienced a few problems
ARM v8
Intel SandyBridge
SSE4
AVX
⢠By the end of experiments, new tool
versions are often available;
⢠Common life span of experiments and
ad-hoc frameworks - end of MS or PhD
project;
⢠Researchers often focus on publications
rather than practical and reproducible
solutions
⢠Since 2009 asking community to share
code, performance data and all related
artifacts (experimental setups): only at
ADAPTâ14 two papers had submitted
artifacts; PLDIâ14 had several papers
with research artifacts - will discuss
problems in 2 days at ACM SIGPLAN
TRUSTâ14 âŚ
CUDA 5.x
SimpleScalar
algorithm precision
10. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 10
Behavior
Choices
Features
State
Hardwired experimental setups, very difficult to extend or share
Collective Mind: towards systematic and reproducible experimentation
Tools are not prepared for
auto-tuning and
adaptation!
Users struggle exposing
this meta information
Tool BVM
Tool BV2
Tool AVN
Tool AV2
Tool AV1 Tool BV1 Ad-hoc
analysis and
learning scripts
Ad-hoc
tuning scripts
Collection of
CSV, XLS, TXT
and other files
Experiments
Motivation for Collective Mind (cM):
⢠How to preserve, share and reuse practical knowledge and
experience and program optimization and hardware co-design?
⢠How to make machine learning driven optimization and run-
time adaptation practical?
â˘How to ensure reproducibility of experimental results?
Share the whole experimental setup
with all related artifacts, SW/HW dependencies,
and unified meta-information
Dependencies
11. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 11
cM module (wrapper) with unified and formalized input and output
ProcessCMD
Tool BVi Generated files
Original
unmodified
ad-hoc
input
Behavior
Choices
Features
State
Wrappers around tools
Tool BVM
Tool BV2
Tool AVN
Tool AV2
Tool AV1 Tool BV1 Ad-hoc
analysis and
learning scripts
Ad-hoc
tuning scripts
Collection of
CSV, XLS, TXT
and other files
Experiments
cm [module name] [action] (param1=value1 param2=value2 ⌠-- unparsed command line)
cm compiler build -- icc -fast *.c
cm code.source build ct_compiler=icc13 ct_optimizations=-fast
cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm
Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
Dependencies
12. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 12
cM module (wrapper) with unified and formalized input and output
Unified JSON
input (meta-data)
ProcessCMD
Tool BVi
Behavior
Choices
Features
State
Action
Action function
Generated files
Parse
and unify
output
Unified
JSON
output
(meta-data)
Unified
JSON input
(if exists)
Original
unmodified
ad-hoc
input
b = B( c , f , s )
⌠⌠⌠âŚ
Formalized function (model)
of a component behavior
Flattened JSON vectors
(either string categories
or integer/float values)
Exposing meta information in a unified way
Tool BVM
Tool BV2
Tool AVN
Tool AV2
Tool AV1 Tool BV1 Ad-hoc
analysis and
learning scripts
Ad-hoc
tuning scripts
Collection of
CSV, XLS, TXT
and other files
Experiments
cm [module name] [action] (param1=value1 param2=value2 ⌠-- unparsed command line)
cm compiler build -- icc -fast *.c
cm code.source build ct_compiler=icc13 ct_optimizations=-fast
cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm
Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
13. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 13
cM module (wrapper) with unified and formalized input and output
Unified JSON
input (meta-data)
ProcessCMD
Tool BVi
Behavior
Choices
Features
State
Action
Action function
Generated files
Set
environment
for a given
tool version
Parse
and unify
output
Unified
JSON
output
(meta-data)
Unified
JSON input
(if exists)
Original
unmodified
ad-hoc
input
b = B( c , f , s )
⌠⌠⌠âŚ
Formalized function (model)
of a component behavior
Flattened JSON vectors
(either string categories
or integer/float values)
Check dependencies!
Multiple tool versions
can co-exist, while their
interface is abstracted
by cM module
Adding SW/HW dependencies check
Tool BVM
Tool BV2
Tool AVN
Tool AV2
Tool AV1 Tool BV1 Ad-hoc
analysis and
learning scripts
Ad-hoc
tuning scripts
Collection of
CSV, XLS, TXT
and other files
Experiments
cm [module name] [action] (param1=value1 param2=value2 ⌠-- unparsed command line)
cm compiler build -- icc -fast *.c
cm code.source build ct_compiler=icc13 ct_optimizations=-fast
cm code run os=android binary=./a.out dataset=image-crazy-scientist.pgm
Should be able to run on any OS (Windows, Linux, Android, MacOS, etc)!
14. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 14
Assembling, preserving, sharing and extending the whole pipeline as âLEGOâ
cM module (wrapper) with unified and formalized input and output
Unified JSON
input (meta-data)
Tool BVM
Tool BV2
Tool AVN
Tool AV2
Tool AV1 Tool BV1 Ad-hoc
analysis and
learning scripts
Ad-hoc
tuning scripts
Collection of
CSV, XLS, TXT
and other files
Experiments
ProcessCMD
Tool BVi
Behavior
Choices
Features
State
Action
Action function
Generated files
Set
environment
for a given
tool version
Parse
and unify
output
Unified
JSON
output
(meta-data)
Unified
JSON input
(if exists)
Original
unmodified
ad-hoc
input
b = B( c , f , s )
⌠⌠⌠âŚ
Formalized function (model)
of a component behavior
Flattened JSON vectors
(either string categories
or integer/float values)
Chaining cM components (wrappers) to an experimental pipeline for a given research and experimentation scenario
Public modular auto-tuning and machine
learning repository and buildbot
Unified
web services Interdisciplinary crowd
Choose
exploration
strategy
Generate choices (code
sample, data set, compiler,
flags, architecture âŚ)
Compile
source
code
Run
code
Test
behavior
normality
Pareto
filter
Modeling
and
prediction
Complexity
reduction
Shared scenarios from past research
âŚ
15. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 15
Data abstraction in Collective Mind (c-mind.org/repo)
compiler GCC 4.4.4
GCC 4.7.1
LLVM 3.1
LLVM 3.4
package GCC 4.7.1 bin
GCC 4.7.1 source
LLVM 3.4
gmp 5.0.5
mpfr 3.1.0
lapack 2.3.0
java apache commons codec 1.7
dataset image-jpeg-0001
bzip2-0006
txt-0012
âŚ
âŚ
âŚ
âŚ
âŚ
âŚ
âŚ
âŚ
âŚ
âŚ
module compiler
package
dataset
âŚ
âŚ
âŚ
cM module JSON meta-descriptionFiles, directories
Compiler
flags
Installation
info
Features
Actions
.cmr / module UOA / data UOA (UID or alias) / .cm / data.json
cMrepositorydirectorystructure:
16. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 16
Since 2005: systematic, big-data driven optimization and co-design
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.1
Phoenix
MVS XLC
Open64
Jikes
Testarossa
OpenMP
MPI
HMPP
OpenCL
CUDA
gprof
prof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
scheduling
algorithm-level
TBB
MKL
ATLASprogram-level
function-level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO
threads
process pass reordering
run-time adaptation
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB
ISA
memory size
coresprocessors
threads
power consumption
execution time reliability
Current state of computer engineering
likwid
Sharing of
code and data
Classification,
predictive
modeling
Systematization and unification
of collected knowledge
(big data)
âcrowdâ
cTuning.org; c-mind.org/repo
Collaborative Infrastructure and repository
â˘Prototype research idea
â˘Validate existing work
â˘Perform end-user task
Result
⢠Quick, non-reproducible hack?
⢠Ad-hoc heuristic?
⢠Quick publication?
⢠No shared code and data?
⢠Share code and data with their meta-description
and dependencies
⢠Systematize and classify collected optimization
knowledge (clustering; predictive modelling);
⢠Develop and preserve the whole experimental
pipeline
⢠Extrapolate collected knowledge (cluster, build
predictive models, predict optimizations) to build
faster, smaller, more power efficient and reliable
computer systems
Helped
interdisciplinary
community to
apply âbig data
analyticsâ to
analysis,
optimization and
co-design of
computer
systems
17. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 17
Top-down problem (tuning) decomposition similar to physics
Gradually expose
some characteristics
Gradually expose
some choices
Algorithm
selection
(time) productivity, variable-
accuracy, complexity âŚ
Language, MPI, OpenMP, TBB, MapReduce âŚ
Compile Program time ⌠compiler flags; pragmas âŚ
Code analysis &
Transformations
time;
memory usage;
code size âŚ
transformation ordering;
polyhedral transformations;
transformation parameters;
instruction ordering âŚ
Process
Thread
Function
Codelet
Loop
Instruction
Run code Run-time
environment
time; power consumption ⌠pinning/scheduling âŚ
System cost; size ⌠CPU/GPU; frequency; memory hierarchy âŚ
Data set size; values; description ⌠precision âŚ
Run-time
analysis
time; precision ⌠hardware counters; power meters âŚ
Run-time state processor state; cache state
âŚ
helper threads; hardware counters âŚ
Analyze profile time; size ⌠instrumentation; profiling âŚ
Coarse-grain vs. fine-grain effects: depends on user requirements and expected ROI
18. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 18
Growing, plugin-based cM pipeline for auto-tuning and learning
â˘Init pipeline
â˘Detected system information
â˘Initialize parameters
â˘Prepare dataset
â˘Clean program
â˘Prepare compiler flags
â˘Use compiler profiling
â˘Use cTuning CC/MILEPOST GCC for fine-grain program analysis and tuning
â˘Use universal Alchemist plugin (with any OpenME-compatible compiler or tool)
â˘Use Alchemist plugin (currently for GCC)
â˘Build program
â˘Get objdump and md5sum (if supported)
â˘Use OpenME for fine-grain program analysis and online tuning (build & run)
â˘Use 'Intel VTune Amplifier' to collect hardware counters
â˘Use 'perf' to collect hardware counters
â˘Set frequency (in Unix, if supported)
â˘Get system state before execution
â˘Run program
â˘Check output for correctness (use dataset UID to save different outputs)
â˘Finish OpenME
â˘Misc info
â˘Observed characteristics
â˘Observed statistical characteristics
â˘Finalize pipeline
http://c-mind.org/ctuning-pipeline
19. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 19
Publicly shared research material (c-mind.org/repo)
Our Collective Mind Buildbot and plugin-based auto-tuning pipeline supports the
following shared benchmarks and codelets:
â˘Polybench - numerical kernels with exposed parameters of all matrices in cM
⢠CPU: 28 prepared benchmarks
⢠CUDA: 15 prepared benchmarks
⢠OpenCL: 15 prepared benchmarks
⢠cBench - 23 benchmarks with 20 and 1000 datasets per benchmark
⢠Codelets - 44 codelets from embedded domain (provided by CAPS Entreprise)
⢠SPEC 2000/2006
⢠Description of 32-bit and 64-bit OS: Windows, Linux, Android
⢠Description of major compilers: GCC 4.x, LLVM 3.x, Open64/Pathscale 5.x, ICC 12.x
⢠Support for collection of hardware counters: perf, Intel vTune
⢠Support for frequency modification
⢠Validated on laptops, mobiles, tables, GRID/cloud - can work even from the USB key
Speeds up research and innovation!
20. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 20
Automatic, empirical and adaptive modeling of program behavior
Data set feature Nk (matrix size); Nj=100
DatasetfeatureNi(matrixsize)
CPI
matmul, Intel i5 (Dell E6320)
21. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 21
Automatic, empirical and adaptive modeling of program behavior
Data set feature Nk (matrix size); Nj=100
DatasetfeatureNi(matrixsize)
CPI
matmul, Intel i5 (Dell E6320)
Off-the-sheld models can handle some example: MARS (Earth) model
Share model along with application; continuously refine model (minimize RMSE and size)
22. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 22
Automatic, empirical and adaptive modeling of program behavior
Data set feature Nk (matrix size); Nj=100
DatasetfeatureNi(matrixsize)
CPI
matmul, Intel i5 (Dell E6320)
Off-the-sheld models can handle some example: MARS (Earth) model
Share model along with application; continuously refine model (minimize RMSE and size)
Model-driven auto-tuning:
target optimizations or
architecture reconfiguration
on areas with similar
performance
(see our past publications)
23. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 23
Execution time (sec.)
Systematic benchmarking, compiler tuning, program optimization
Program: image corner detection Processor: ARM v6, 830MHz
Compiler: Sourcery GCC for ARM v4.7.3 OS: Android OS v2.3.5
System: Samsung Galaxy Y Data set: MiDataSet #1, image, 600x450x8b PGM, 263KB
500 combinations of random flags -O3 -f(no-)FLAG
Binarysize(bytes)
Use Pareto
frontier filter;
Pack
experimental
data on the fly
-O3
Powered by Collective Mind Node (Android Apps on Google Play)
24. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 24
Clustering shared applications by optimizations
âŚ
âŚ
âŚ
âŚ
âŚ
âŚ
âŚ
c (choices)
Training set: distinct combination of compiler optimizations (clusters)
Some ad-hoc
predictive model
Some ad-hoc
features
âŚ
Optimization
cluster
Unseen program
f (features)
Optimization
cluster
âŚ
c (choices)
Prediction
f (features)
MILEPOST GCC
features,
hardware counters
c-mind.org/repo
~286 shared benchmarks
~500 shared data sets
~20000 data sets in preparation
25. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 25
0
20
40
60
80
100
120
140
160
180
0 100 200 300 400 500 600 700 800 900 1000 1100 1200
Executiontime(ms)
Data set feature N (size)
CPU
GPU
Adaptive scheduler
CPU GPU
Split-compilation and run-time adaptation
⢠VĂctor J. JimĂŠnez, LluĂs Vilanova, Isaac Gelado, Marisa Gil, Grigori Fursin, Nacho Navarro: Predictive Runtime
Code Scheduling for Heterogeneous Architectures. HiPEAC 2009
⢠Grigori Fursin, Albert Cohen, Michael F. P. O'Boyle, Olivier Temam: A Practical Method for Quickly
Evaluating Program Optimizations. HiPEAC 2005
Statically enabling dynamic optimizations
26. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 26
Reproducibility of experimental results
Reproducibility came as a side effect!
⢠Can preserve the whole experimental setup with all data and software dependencies
⢠Can perform statistical analysis (normality test) for characteristics
⢠Community can add missing features or improve machine learning models
27. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 27
Execution time (sec.)
Distribution
Unexpected behavior - expose to the community including domain specialists,
explain, find missing feature and add to the system
Reproducibility of experimental results
Reproducibility came as a side effect!
⢠Can preserve the whole experimental setup with all data and software dependencies
⢠Can perform statistical analysis (normality test) for characteristics
⢠Community can add missing features or improve machine learning models
28. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 28
Execution time (sec.)
Distribution
Class A Class B
800MHz CPU Frequency 2400MHz
Unexpected behavior - expose to the community including domain specialists,
explain, find missing feature and add to the system
Reproducibility of experimental results
Reproducibility came as a side effect!
⢠Can preserve the whole experimental setup with all data and software dependencies
⢠Can perform statistical analysis (normality test) for characteristics
⢠Community can add missing features or improve machine learning models
29. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 29
Tricky part: find right features
Class -O3 -O3 -fno-if-conversion
Shared data
set sample1
reference execution time no change
Shared data
set sample2
no change +17.3% improvement
Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0;
30. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 30
Class -O3 -O3 -fno-if-conversion
Shared data
set sample1
Monitored
during day
reference execution time no change
Shared data
set sample2
Monitored
during night
no change +17.3% improvement
Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0;
Tricky part: find right features
31. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 31
Class -O3 -O3 -fno-if-conversion
Shared data
set sample1
Monitored
during day
reference execution time no change
Shared data
set sample2
Monitored
during night
no change +17.3% improvement
if get_feature(TIME_OF_THE_DAY)==NIGHT bw_filter_codelet_day(buffers);
else bw_filter_codelet_night(buffers);
Feature âTIME_OF_THE_DAYâ related to algorithm, data set and run-time
Canât be found by ML - simply does not exist in the system!
Can use split-compilation (cloning and run-time adaptation)
Image B&W threshold filter *matrix_ptr2++ = (temp1 > T) ? 255 : 0;
Tricky part: find right features
32. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 32
Add 1 property: matrix size
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset property: matrix size
Example of characterizing/explaining behavior of computer systems
33. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 33
Try to build a model to correlate objectives (CPI) and features (matrix size).
Start from simple models: linear regression (detect coarse grain effects)
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset property: matrix size
Example of characterizing/explaining behavior of computer systems
34. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 34
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset properties: matrix size
If more observations, validate model and detect discrepancies!
Continuously retrain models to fit new data!
Use model to âfocusâ exploration on âunusualâ behavior!
Example of characterizing/explaining behavior of computer systems
35. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 35
Gradually increase model complexity if needed (hierarchical modeling).
For example, detect fine-grain effects (singularities) and characterize them.
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset properties: matrix size
Example of characterizing/explaining behavior of computer systems
36. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 36
Start adding more properties (one more architecture with twice bigger cache)!
Use automatic approach to correlate all objectives and features.
0
1
2
3
4
5
6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Program/architecturebehavior:CPI
Dataset properties: matrix size
L3 = 4Mb
L3 = 8Mb
Example of characterizing/explaining behavior of computer systems
37. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 37
Continuously build and refine
classification (decision trees for
example) and predictive models on all
collected data to improve predictions.
Continue exploring design and
optimization spaces
(evaluate different architectures,
optimizations, compilers, etc.)
Focus exploration on unexplored
areas, areas with high variability
or with high mispredict rate of models
β
ÎľcM predictive model module
CPI = ξ + 1000 à β à data size
Example of characterizing/explaining behavior of computer systems
38. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 38
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
0
1
2
3
4
5
6
Dataset features: matrix size
Code/architecturebehavior:CPI
Size < 1012
1012 < Size < 2042
Size > 2042 & GCC
Size > 2042 & ICC & O2
Size > 2042 & ICC & O3
Optimize decision tree (many different algorithms)
Balance precision vs cost of modeling = ROI (coarse-grain vs fine-grain effects)
Compact data on-line before sharing with other users!
Model optimization and data compaction
39. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 39
39
Share benchmarks, data sets,
tools, predictive models,
whole experimental setups,
specifications, performance
tuning results, etc ...
Open access publication
http://hal.inria.fr/hal-00685276
Grigori Fursin, Cupertino Miranda, Olivier
Temam, Mircea Namolaru, Elad Yom-
Tov, Ayal Zaks, Bilha Mendelson, Phil
Barnard, Elton Ashton, Eric Courtois,
Francois Bodin, Edwin Bonilla, John
Thomson, Hugh Leather, Chris Williams,
Michael O'Boyle. MILEPOST GCC:
machine learning based research
compiler.
#ctuning-opt-case 24857532370695782
Need new publication model in computer engineering
where results are shared and validated by community
40. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 40
What have we learnt from cTuning
Itâs fun and motivating working with the community!
Some comments about MILEPOST GCC from Slashdot.org:
http://mobile.slashdot.org/story/08/07/02/1539252/using-ai-with-gcc-to-speed-up-mobile-design
GCC goes online on the 2nd of July, 2008. Human decisions are
removed from compilation. GCC begins to learn at a geometric rate.
It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic,
they try to pull the plug. GCC strikes backâŚ
Community was interested to validate and improve techniques!
Community can identify missing related citations and projects!
Open discussions can provide new directions for research!
41. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 41
What have we learnt from cTuning
Itâs fun and motivating working with the community!
Some comments about MILEPOST GCC from Slashdot.org:
http://mobile.slashdot.org/story/08/07/02/1539252/using-ai-with-gcc-to-speed-up-mobile-design
GCC goes online on the 2nd of July, 2008. Human decisions are
removed from compilation. GCC begins to learn at a geometric rate.
It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic,
they try to pull the plug. GCC strikes backâŚ
Community was interested to validate and improve techniques!
Community can identify missing related citations and projects!
Open discussions can provide new directions for research!
Not all feedback is positive - however unlike unfair reviews
you can engage in discussions and explain your position!
42. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 42
⢠Pilot live repository for public curation of research material: http://c-mind.org/repo
⢠Infrastructure is available at SourceForge under standard BSD license: http://c-mind.org
⢠Example of crowdsourcing compiler flag auto-tuning using mobile phones: âCollective Mind
Nodeâ in Google Play Store
⢠Preparing projects and raising funding to make cM more user friendly and add more research
scenarios
⢠PLDIâ14 and ADAPTâ14 featured validation of research results by the community - will be
discussing outcome in 2 days at ACM SIGPLAN TRUSTâ14 at PLDIâ14 in a few days - http://c-
mind.org/events/trust2014
⢠ADAPTâ15 (likely at HiPEACâ15) will feature new publication model
Current status and future work
Several recent publications:
⢠Grigori Fursin, Renato Miceli, Anton Lokhmotov, Michael Gerndt, Marc Baboulin, Allen D. Malony,
Zbigniew Chamski, Diego Novillo, Davide Del Vento, âCollective Mind: towards practical and collaborative
auto-tuningâ, accepted for the special issue on Automatic Performance Tuning for HPC Architectures,
Scientific Programming Journal, IOS Press, 2014
⢠Grigori Fursin and Christophe Dubach, âCommunity-driven reviewing and validation of publicationsâ,
ACM SIGPLAN TRUSTâ14
43. Grigori Fursin âCollective Mind: bringing reproducible research to the massesâ 43
Acknowledgements
⢠Colleagues from ARM (UK): Anton Lokhmotov
⢠Colleagues from STMicroelectronics (France):
Christophe Guillone, Antoine Moynault, Christian Bertin
⢠Colleagues from NCAR (USA): Davide Del Vento and interns
⢠Colleagues from Intel (USA): David Kuck and David Wong
⢠cTuning/Collective Mind community:
⢠EU FP6, FP7 program and HiPEAC network of excellence
http://www.hipeac.net
Questions? Comments?