SlideShare una empresa de Scribd logo
1 de 76
Descargar para leer sin conexión
cTuning.org: systematizing tuning of computer systems
using crowdsourcing and statistics
Grigori Fursin
INRIA, France
HPSC 2013, Taiwan
March 2013
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 2 / 73
Messages
1st talk (Wednesday)
Systematizing tuning of computer systems
using crowdsourcing and statistics
• Revisiting current design and optimization methodology
• Leveraging experience and computer resources of multiple users
• Using predictive modelling and data mining to improve computer systems
2nd talk (Friday)
Collective Mind: novel methodology, framework
and repository to crowdsource auto-tuning
• New plugin-based extensible infrastructure and repository for collaborative
analysis and tuning of computer systems - will be released in May 2013
• “Big data” enables cooperation between architecture, compiler, OS and
application designers and mathematicians
• Examples of auto-tuning and predictive modeling for numerical kernels
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 3 / 73
Motivation: back to 1993
Semiconductor neural element -
possible base of neural computers
Modeling and understanding
brain functions
• Slow
• Unreliable
• Costly
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 4 / 73
Solution
Motivation: back to basics
Task
Result
End users
User requirements:
most common:
minimize all costs
(time, power consumption,
price, size, faults, etc)
guarantee real-time constraints
(bandwidth, QOS, etc)
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 5 / 73
Solution
Motivation: back to basics
Decision
(depends on user
requirements)
Result
Available choices
(solutions)
User requirements:
most common:
minimize all costs
(time, power consumption,
price, size, faults, etc)
guarantee real-time constraints
(bandwidth, QOS, etc)
End users
Task
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 6 / 73
Solution
Motivation: back to basics
Decision
(depends on user
requirements)
Result
Available choices
(solutions)
User requirements:
most common:
minimize all costs
(time, power consumption,
price, size, faults, etc)
guarantee real-time constraints
(bandwidth, QOS, etc)
Should provide choices
and help with decisions
Hardware and
software designers
End users
Task
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 7 / 73
Solution
Motivation: back to basics
Decision
(depends on user
requirements)
Result
Available choices
(solutions)
User requirements:
most common:
minimize all costs
(time, power consumption,
price, size, faults, etc)
guarantee real-time constraints
(bandwidth, QOS, etc)
Service/application
providers
(supercomputing,
cloud computing,
mobile systems)
Should provide choices
and help with decisions
Hardware and
software designers
End users
Task
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 8 / 73
Available solutions: hardware
Companies compete hard to deliver many solutions with various characteristics:
performance, power consumption, size, bandwidth, response time, reliability, cost …
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 9 / 73
Available solutions: software
Software developers try to keep pace and produce various
algorithms, programming models, languages, analysis tools, compilers,
run-time systems, databases, etc.
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0 ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.0
Phoenix
MVS
XLC
Open64
Jikes
Testarossa
OpenMP MPI
HMPP
OpenCL CUDA gprof prof
perf oprofile
PAPITAU
Scalasca
VTune
Amplifierscheduling
algorithm-
level
TBB
MKL
ATLAS
program-
level
function-
level Codelet
loop-level hardware
counters
IPA
polyhedral
transformations
LTO
threads
process
pass
reordering
run-time
adaptation
per phase
reconfiguration
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 10 / 73
Solutions
Challenges
Task
Result
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 11 / 73
Solutions
Challenges
Task
Result
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.0
Phoenix
MVS XLC
Open64
Jikes
Testarossa
OpenMP MPI
HMPP
OpenCL
CUDA
gprof
prof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifierscheduling
algorithm-
level TBB
MKL
ATLASprogram-
level
function-
level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO
threadsprocess
pass
reordering
run-time
adaptation
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB
ISA
memory size
cores
processors
threads
power
consumptionexecution time
reliability
1) Rising complexity of computer systems:
too many design and optimization choices
2) Performance is not anymore the only
requirement:
multiple user objectives vs choices
benefit vs optimization time
3) Complex relationship and interactions
between ALL software and hardware
components.
4) Too many tools with non-unified interfaces
changing from version to version:
technological chaos
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 12 / 73
Solutions
Challenges
Task
Result
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.0
Phoenix
MVS XLC
Open64
Jikes
Testarossa
OpenMP MPI
HMPP
OpenCL
CUDA
gprof
prof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifierscheduling
algorithm-
level TBB
MKL
ATLASprogram-
level
function-
level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO
threadsprocess
pass
reordering
run-time
adaptation
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB
ISA
memory size
cores
processors
threads
power
consumptionexecution time
reliability
Result:
• finding the right solution for end-user is
extremely challenging
• everyone is lost in choices
• dramatic increase in development time
• low ROI
• underperforming systems
• waste of energy
• ad-hoc, repetitive and error-prone
manual tuning
• slowing innovation in science and
technology
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 13 / 73
Solutions
Challenges
Task
Result
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.0
Phoenix
MVS XLC
Open64
Jikes
Testarossa
OpenMP MPI
HMPP
OpenCL
CUDA
gprof
prof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifierscheduling
algorithm-
level TBB
MKL
ATLASprogram-
level
function-
level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO
threadsprocess
pass
reordering
run-time
adaptation
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB
ISA
memory size
cores
processors
threads
power
consumptionexecution time
reliability
Result:
• finding the right solution for end-user is
extremely challenging
• everyone is lost in choices
• dramatic increase in development time
• low ROI
• underperforming systems
• waste of energy
• ad-hoc, repetitive and error-prone
manual tuning
• slowing innovation in science and
technology
Understanding and modeling of the overall
relationship between end-user algorithms,
applications, compiler optimizations,
hardware designs, data sets and run-time
behavior became simply infeasible!
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 14 / 73
Treat
computer
system as a
black box
Attempts to solve these problems: auto-tuning
Task
Result
Application
Compilers and auxiliary tools
Binary and libraries
Architecture
Run-time environment
State of the system
Data set
Algorithm
Use auto-tuning:
Explore multiple
choices empirically:
learn behavior of
computer systems
across executions
Covered all
components in the
last 2 decades and
showed high
potential but …
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 15 / 73
• Optimization spaces are large and non-linear with many local minima
• Exploration is slow and ad-hoc (random, genetic, some heuristics)
• Only a few benchmarks are considered
• Often the same (one) dataset is used
• Only part of the system is taken into account
(rarely reflect behavior of the whole system)
• No knowledge sharing
Auto-tuning shows high potential for nearly 2 decades but still far from
the mainstream in production environments.
Why?
Attempts to solve these problems: auto-tuning
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 16 / 73
Treat
computer
system as a
black box
Task
Result
Application
Compilers and auxiliary tools
Binary and libraries
Architecture
Run-time environment
State of the system
Data set
Algorithm
Use machine
learning to speed
up exploration
Apply predictive
modeling to suggest
profitable solutions
based on properties
of a task and a
system
Covered all
components in the
last decade and
showed high
potential but …
0
2
4
6
Attempts to solve these problems: machine learning
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 17 / 73
• Selection of machine learning models and right properties is non-trivial:
ad-hoc in most of the cases
• Limited training sets
• Only part of the system is taken into account
(rarely reflect behavior of the whole system)
• No knowledge sharing
Machine learning (classification, predictive modeling) shows high
potential during past decade but still far from the mainstream.
Why?
Attempts to solve these problems: machine learning
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 18 / 73
Treat
computer
system as a
black box
Task
Result
Application
Compilers and auxiliary tools
Binary and libraries
Architecture
Run-time environment
State of the system
Data set
Algorithm
Co-design:
Explore choices and
behavior of the
whole system.
Attempts to solve these problems: co-design
Showed high
potential in the last
years but …
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 19 / 73
• Even more choices to explore and analyze
• Often impossible to expose tuning choices or obtain characteristics at all levels
• Limited training sets
• Still no knowledge sharing
Co-design is currently a buzz word and a hot research topic
but still far from the mainstream.
Why?
Attempts to solve these problems: co-design
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 20 / 73
Can we crowdsource auto-tuning? My main focus since 2004
Can we leverage their experience and computational resources?
Can we transparently distribute optimization
and learning across many users?
Millions of users run realistic applications on different architectures with
different datasets, run-time systems, compilers, optimizations!
Got stuck with a limited number of benchmarks, datasets,
architectures and a large number of optimizations and generated data
- needed dramatically new approach!
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 21 / 73
Challenges
User
application
Hot function
How can we evaluate optimizations in a realistic environment without
complex recompilation frameworks and without source code?
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 22 / 73
Challenges
User
application
Hot function
Oref
First problem:
need reference run with the same dataset
User
application
Hot function
Onew
Speed up
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 23 / 73
Challenges
User
application
Hot function
Oref
30repetitions
Second problem: variation in execution time due to different run-time states
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 24 / 73
Challenges
User
application
Hot function
Oref
Second problem: variation in execution time due to different run-time states
30repetitions
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 25 / 73
Challenges
User
application
Hot function
Oref
30repetitions
How can we evaluate some optimization in a realistic environment?
Second problem: variation in execution time due to different run-time states
(parallel processes, adaptive scheduling, pinning, cache state, bus state,
frequency changes, etc)
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 26 / 73
Our approach: static multiversioning
Application
Select most time consuming code sections
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 27 / 73
Our approach: static multiversioning
Application
Create multiple versions of time
consuming code sections
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 28 / 73
Our approach: static multiversioning
Application
Add monitoring routines
monitor_start
monitor_stop
monitor_start
monitor_stop
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 29 / 73
Our approach: static multiversioning
Application
Apply various transformations over clones
of code sections
monitor_start
monitor_stop
monitor_start
monitor_stop
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 30 / 73
Our approach: static multiversioning
Application
Apply various transformations over clones
of code sections
Select global or fine-grain internal compiler (or algorithm) optimizations
monitor_start
monitor_stop
monitor_start
monitor_stop
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 31 / 73
Our approach: static multiversioning
Application
Apply various transformations over clones
of code sections
monitor_start
monitor_stop
monitor_start
monitor_stop
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 32 / 73
Application
Apply various transformations over clones
of code sections
Differerent ISA;
manual transformations, etc
Our approach: static multiversioning
monitor_start
monitor_stop
monitor_start
monitor_stop
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 33 / 73
Application
Final instrumented program
Our approach: static multiversioning
monitor_start
monitor_stop
monitor_start
monitor_stop
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 34 / 73G. Fursin et. al. “A Practical Method for Quickly Evaluating Program Optimizations” HiPEAC 2005
0
0.2
0.4
0.6
0.8
1
1 11 21 31 41 51 61 71 81 91 101 7013 7023
function calls
IPC
IPC for subroutine resid of benchmark mgrid across calls
Observations: program execution phases
• Define stability by 3 consecutive or periodic executions
with the same IPC
• Predict further occurrences with the same IPC
(using period and length of regions with stable performance)
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 35 / 73G. Fursin et. al. “A Practical Method for Quickly Evaluating Program Optimizations” HiPEAC 2005
0
0.2
0.4
0.6
0.8
1
1 11 21 31 41 51 61 71 81 91 101 7013 7023
function calls
IPC
• Define stability by 3 consecutive or periodic executions
with the same IPC
• Predict further occurrences with the same IPC
(using period and length of regions with stable performance)
period=7, length=2
Observations: program execution phases
IPC for subroutine resid of benchmark mgrid across calls
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 36 / 73
Observations: program execution phases
Some programs exhibit stable behavior
0
0.02
0.04
0.06
0.08
0.1
0.12
1 42 70 98 213 2025
function calls
time(sec)
startup (phase detection) or end of the optimization process (best option found)
evaluation of 1 option
1 2 3 1 2 3
1) Consider clone with new optimization is evaluated after 2 consecutive executions
of the code section with the same performance
2) Ignore one next execution to avoid transitional effects
3) Check baseline performance (to verify stability prediction)
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 37 / 73
0
0.02
0.04
0.06
0.08
0.1
0.12
1 201 401 601 801 1001 1201 1401 1601 1801 2001
function calls
time(sec)
Observations: program execution phases
• Can transparently to end-user evaluate multiple optimizations
• Statically enable adaptive binaries (that can react to dataset or run-time
state changes without any need for JIT or other complex frameworks)
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 38 / 73
0
0.02
0.04
0.06
0.08
0.1
0.12
1 201 401 601 801 1001 1201 1401 1601 1801 2001
function calls
time(sec)
• Grigori Fursin et al. A Practical Method For Quickly Evaluating Program Optimizations. Proceedings of the 1st International
Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2005), number 3793 in LNCS, pages 29-46,
Barcelona, Spain, November 2005 Highest ranked paper
• Can transparently to end-user evaluate multiple optimizations
• Statically enable adaptive binaries (that can react to dataset or run-time
state changes without any need for JIT or other complex frameworks)
Transparent monitoring and adaptation of static programs
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 39 / 73
Observations: random behavior
Randomly select versions at run-time
Monitor speedup variation over time
jpeg decoder, GCC 4.5, Intel architecture
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 40 / 73
Transparently measuring the impact of optimizations
Dataset1 DatasetN
Execution
Intercept exit() and call
Collective Stats Handler
Prolog of the time consuming code
Start profiling and
Randomly select version (original
or clone)
Cloned code
(Optimizations2)
Stop profiling
Original code
(Optimizations1)
Epilog of the time consuming code
Binary
- Profiling Routines
- Collective Stats
- Unique IDs
Function clones
with different
optimizations
Collective Compiler
GCC Interface:
- create code clones
- Apply optimizations
per clone
- intercept
main()and add
auxiliary routines
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 41 / 73
Transparently measuring the impact of optimizations
Dataset1 DatasetN
Execution
Intercept exit() and call
Collective Stats Handler
Prolog of the time consuming code
Start profiling and
Randomly select version (original
or clone)
Cloned code
(Optimizations2)
Stop profiling
Original code
(Optimizations1)
Epilog of the time consuming code
Collective Compiler Binary
- Profiling Routines
- Collective Stats
- Unique IDs
Function clones
with different
optimizations
GCC Interface:
- create code clones
- Apply optimizations
per clone
- intercept
main()and add
auxiliary routines
NetworkWeb Server
Collective Optimization
Web Services
- Register events
- Query database
- Get statistics
…
Collective Optimization
Database
- COMPILATION table
- EXECUTION table
- AUXILARY tables
MySQL
cTuning.org
Initiate recompilation if better optimization setting
is suggested based on Collective Knowledge
UserX
ProgramA
ArchB
UserY
ProgramC
ArchD
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 42 / 73
Speeding up research (2005-cur.)
• Grigori Fursin et al. Collective Optimization: A Practical Collaborative Approach. ACM Transactions on Architecture and
Code Optimization (TACO), December 2010, Volume 7, Number 4, pages 20-49
Concept is included into EU HiPEAC research vision 2012-2020
• Grigori Fursin et al. Collective optimization. Proceedings of the International Conference on High Performance Embedded
Architectures & Compilers (HiPEAC 2009), Paphos, Cyprus, January 2009
• Can observe behavior and evaluate optimizations in various GRID
servers, cloud services, desktops, etc …
• multiple benchmarks/datasets
• multiple architectures
• multiple compilers
• multiple optimizations
Opened up many interesting research opportunities,
particularly for data mining and predictive modeling!
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 43 / 73
Collaborative exploration of large optimization spaces
Multi-objective optimizations (depend on user scenarios):
HPC and desktops: improving execution time
Data centers and real-time systems: improving execution and compilation time
Embedded systems: improving execution time and code size
Now additional requirement: reduce power consumption
susan corners kernel
Intel Core2
GCC 4.4.4
similar results on ICC 11.1
baseline opt=-O3
~100 optimizations
random combinations
(50% probability)
Nowadays used for
auto-parallelization,
reduction of contentions,
reduction of communication
costs, etc.
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 44 / 73
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
0 10 20 30 40 50 60 70 80
Probability
Optimizations
Start: 50% probability to select optimization (uniform distribution)
Online focused exploration and learning
Avoiding collection of huge amount of data -
filtering (compacting) and learning space on the fly
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 45 / 73
Current random selection of optimizations reduced execution time:
reduce probabilities of the selected optimizations
Online focused exploration and learning
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 46 / 73
Current random selection of optimizations improved execution time:
reward probabilities of the selected optimizations
Online focused exploration and learning
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 47 / 73
A – Break up large expression trees
B – Value propagation
C – Hoisting of loop invariants
D – Loop normalization
E – Loop unrolling
F – Mark constant variables
G – Dismantle array instructions
H – Eliminating copies
A B
C D
E
F G H
“good optimizations” across all programs:
Faster then traditional search (~50 iterations).
Can stuck in local minima
Speedups 1.1-2x. Sometimes better to reduce
Intel compiler optimization level!
Online focused exploration and learning
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 48 / 73
14 transformations, sequences of length 5, search space = 396000
• F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M.F.P. O'Boyle, J. Thomson, M. Toussaint and C.K.I. Williams. Using
Machine Learning to Focus Iterative Optimization. Proceedings of the 4th Annual International Symposium on Code Generation
and Optimization (CGO), New York, NY, USA, March 2006
Online focused exploration and learning
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 49 / 73
AMD platform, GCC 4.5, image corner detection (susan_corners)
Online probabilistic exploration
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 50 / 73
Reactions to optimizations across multiple datasets
http://ctuning.org/cbench
MiBench, 20 datasets per benchmark, 200/1000 random combination of
Open64 (GCC) compiler flags, 5 months of experiments
jpeg_d
(clustering)
dijkstra
(not sensitive)
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 51 / 73
Unifying adaptation of statically compiled programs
…
Statically-compiled adaptive binaries and libraries
Iterative /collective
compilation with
multiple datasets
Function
Version2
Function
VersionN
Function
Version1
Original
hot
function
Step 1
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 52 / 73
…
Representative set of versions for the following optimization cases to minimize
execution time, power consumption and code-size across all available datasets:
 optimizations for different datasets
 optimizations/compilation for different architectures (heterogeneous or
reconfigurable processors with different ISA such as GPGPU, CELL, etc or the
same ISA with extensions such as 3dnow, SSE, etc or virtual environments)
 optimizations for different program phases or different run-time environment
behavior
Statically-compiled adaptive binaries and libraries
Iterative /collective
compilation with
multiple datasets
Function
Version2
Function
VersionN
Function
Version1
Original
hot
function
Step 2
Unifying adaptation of statically compiled programs
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 53 / 73
Extract
dataset
features
Selection mechanism optimized for low run-
time overhead
…
Representative set of versions for the following optimization cases to minimize
execution time, power consumption and code-size across all available datasets:
 optimizations for different datasets
 optimizations/compilation for different architectures (heterogeneous or
reconfigurable processors with different ISA such as GPGPU, CELL, etc or the
same ISA with extensions such as 3dnow, SSE, etc or virtual environments)
 optimizations for different program phases or different run-time environment
behavior
Statically-compiled adaptive binaries and libraries
Iterative /collective
compilation with
multiple datasets
Function
Version2
Function
VersionN
Function
Version1
Original
hot
function
Machine learning
techniques to find
mapping between
different run-time
contexts and
representative
versions
Step 3
Unifying adaptation of statically compiled programs
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 54 / 73
Extract
dataset
features
Monitor run-time behavior or architectural
changes (in virtual, reconfigurable or
heterogeneous environments) using timers
or performance counters
Selection mechanism optimized for low run-
time overhead
…
Representative set of versions for the following optimization cases to minimize
execution time, power consumption and code-size across all available datasets:
 optimizations for different datasets
 optimizations/compilation for different architectures (heterogeneous or
reconfigurable processors with different ISA such as GPGPU, CELL, etc or the
same ISA with extensions such as 3dnow, SSE, etc or virtual environments)
 optimizations for different program phases or different run-time environment
behavior
Statically-compiled adaptive binaries and libraries
Machine learning
techniques to find
mapping between
different run-time
contexts and
representative
versions
Iterative /collective
compilation with
multiple datasets
Function
Version2
Function
VersionN
Function
Version1
Original
hot
function
Dynamic
Unifying adaptation of statically compiled programs
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 55 / 73
Online tuning: adaptive scheduling
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 56 / 73
• Victor Jimenez, Isaac Gelado, Lluis Vilanova, Marisa Gil, Grigori Fursin and Nacho Navarro. Predictive runtime code scheduling
for heterogeneous architectures. Proceedings of the International Conference on High Performance Embedded Architectures &
Compilers (HiPEAC 2009), Paphos, Cyprus, January 2009
Online tuning: adaptive scheduling
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 57 / 73
Online tuning: adaptive scheduling
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 58 / 73
Online tuning: adaptive scheduling
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 59 / 73
Optimization knowledge reuse across programs
Program
Datasets Architectures
Datasets
Datasets
Architectures
Architectures
Architectures
Architectures
Started systematizing knowledge per program across datasets and architectures
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 60 / 73
How to reuse knowledge among programs?
Program
Datasets Architectures
Datasets
Datasets
Architectures
Architectures
Architectures
Architectures
Started systematizing knowledge per program across datasets and architectures
Program
Program
Program
Optimization knowledge reuse across programs
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 61 / 73
1) Add as many various features as possible (or use expert knowledge):
MILEPOST GCC with Interactive Compilation Interface:
ft1 - Number of basic blocks in the method
…
ft19 - Number of direct calls in the method
ft20 - Number of conditional branches in the method
ft21 - Number of assignment instructions in the method
ft22 - Number of binary integer operations in the method
ft23 - Number of binary floating point operations in the method
ft24 - Number of instructions in the method
…
ft54 - Number of local variables that are pointers in the method
ft55 - Number of static/extern variables that are pointers in the method
2) Correlate features and objectives in cTuning using nearest neighbor classifiers, decision trees, SVM,
fuzzy pattern matching, etc.
3) Given new program, dataset, architecture, predict behavior based on prior knowledge!
Data mining and machine learning
Code patterns:
for F
for F
for F
…
load … L
mult … A
store … S
…
Collecting data from multiple users in a unified way allows to apply various data mining
(machine learning) techniques to detect relationship between the behaviour and features
of all components of the computer systems
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 62 / 73
Nearest-neighbour classifier
Example: Euclidean distance based on static program
features normalized by number of instructions
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 63 / 73
Optimization prediction
Grigori Fursin et al. MILEPOST GCC: machine learning enabled self-tuning compiler.
International Journal of Parallel Programming (IJPP), June 2011, Volume 39, Issue 3, pages 296-327
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 64 / 73
Most informative Performance Counters
1) L1_TCA 2) L1_DCH 3) TLB_DM
4) BR_INS 5) RES_STL 6) TOT_CYC
7) L2_ICH 8) VEC_INS 9) L2_DCH
10) L2_TCA 11) L1_DCA 12) HW_INT
13) L2_TCH 14) L1_TCH 15) BR_MS
Analysis of the importance of the performance counters.
The data contains one good optimization sequence per benchmark.
Calculating mutual information between a subset of the performance
counters and good optimization sequences
Principle Component Analysis:
• John Cavazos, Grigori Fursin, Felix Agakov, Edwin Bonilla, Michael F.P.O'Boyle and Olivier Temam. Rapidly Selecting Good
Compiler Optimizations using Performance Counters. Proceedings of the 5th Annual International Symposium on Code Generation
and Optimization (CGO), San Jose, USA, March 2007
Dynamic features
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 65 / 73
And much more …
• Analysis and detection of contentions in multi-core systems with
shared cache
• Fast CPU/memory bound detection through breaking code
semantics
• Software/hardware co-design (predicting better architecture
designs)
• Performance/power balancing (through frequency variation)
• Decomposition of large applications into codelets for performance
modeling
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 66 / 73
• Used in MILEPOST project (2007-2009) by IBM, CAPS, University of Edinburgh, INRIA to
build first public machine-learning based compiler
• Opened for public access in 2009 to continue collaborative R&D
Public Collective Tuning Portal (cTuning.org)
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 67 / 73
Share
Explore
Model
Discover
Reproduce
Extend
Have fun!
Enabling reproducibility of results (new publication model)
Grigori Fursin et al. MILEPOST GCC: machine learning enabled self-tuning compiler.
International Journal of Parallel Programming (IJPP) , June 2011, Volume 39, Issue 3, pages 296-327
Substitute many tuning pragmas just with one that is converted into combination of optimizations:
#ctuning-opt-case 24857532370695782
Accepted as an EU HiPEAC theme (2012-2016)
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 68 / 73
What have we learnt from cTuning1
It’s fun working with the community!
Some comments about MILEPOST GCC from Slashdot.org:
http://mobile.slashdot.org/story/08/07/02/1539252/using-ai-with-gcc-to-speed-up-mobile-design
GCC goes online on the 2nd of July, 2008. Human decisions are
removed from compilation. GCC begins to learn at a geometric rate.
It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic,
they try to pull the plug. GCC strikes back…
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 69 / 73
What have we learnt from cTuning1
It’s fun working with the community!
Some comments about MILEPOST GCC from Slashdot.org:
http://mobile.slashdot.org/story/08/07/02/1539252/using-ai-with-gcc-to-speed-up-mobile-design
GCC goes online on the 2nd of July, 2008. Human decisions are
removed from compilation. GCC begins to learn at a geometric rate.
It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic,
they try to pull the plug. GCC strikes back…
Not all feedback is positive - helps you learn, improve tools
and motivate new research directions!
Community was interested to validate and improve techniques!
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 70 / 73
Community is very interested in open “big data” for collaborative R&D
GCC 4.1.x
GCC 4.2.x
GCC 4.3.x
GCC 4.4.x
GCC 4.5.x
GCC 4.6.x
GCC 4.7.x
ICC 10.1
ICC 11.0
ICC 11.1
ICC 12.0
ICC 12.1
LLVM 2.6
LLVM 2.7
LLVM 2.8
LLVM 2.9
LLVM 3.1
Phoenix
MVS XLC
Open64
Jikes
Testarossa
OpenMP
MPI
HMPP
OpenCL
CUDA
gprof
prof
perf
oprofile
PAPI
TAU
Scalasca
VTune
Amplifier
scheduling
algorithm-level
TBB
MKL
ATLASprogram-level
function-level
Codelet
loop-level
hardware
counters
IPA
polyhedral
transformations
LTO
threads
process pass reordering
run-time adaptation
per phase
reconfiguration
cache size
frequency
bandwidth
HDD size
TLB
ISA
memory size
coresprocessors
threads
power consumption
execution time reliability
Current state of computer engineering
likwid
Classification,
predictive
modeling
Optimal
solutions
Systematization and unification
of collective knowledge
(big data)
“crowd”
Collaborative Infrastructure and repository
for continuous online learning
End-user
task
Result
Quick, non-reproducible hack?
Ad-hoc heuristic?
Quick publication?
Waste of expensive resources
and energy?
cTuning.org collaborative
approach
Continuous systematization and
unification of design and
optimization of computer systems
Extrapolate collective knowledge to build faster and more power efficient
computer systems to continue innovation in science and technology!
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 71 / 73
Now have public repository, tools, benchmarks, datasets
and methodology that can help:
Academia (students and researchers):
• Instead of loosing time on developing tools for ever changing environments,
focus on statistical, data mining and machine learning techniques to:
• unify program optimization, design space exploration, run-time adaptation
• detect important characteristics of computer systems
• detect representative benchmarks and data sets
• evaluate multiple machine learning algorithms to predict optimizations or
hardware designs or dynamic multi-objective adaptation (SVM, decision
trees, hierarchical modeling, etc)
Industry:
• restore confidence in academic research due to reproducibility of results
• use and share collaborative tools
• share statistics about behavior of computer systems and optimizations
• expose choices and characteristics to end-users through unified interfaces
Conclusions - much more to be done!
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 72 / 73
Challenges for public repositories and collaborative tools:
• Data management
• MySQL vs schema-free databases
• central vs distributed repository
• performance vs portability
• extensibility
• online learning and data compaction
• easy sharing
• Portability of the framework across different architectures, OSes, tools
• Interfaces to “open up” tools, architectures, applications for external tuning
• simplicity and portability
• Reproducibility of experiments
• New publication model
Conclusions - much more to be done!
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 73 / 73
• Collective Mind: new plugin-based extensible infrastructure and schema-free
repository for collaborative and holistic analysis and tuning of computer systems -
will be released in May 2013 at HiPEAC computing week in Paris
• OpenME interface to “open up” compilers, run-time systems and applications for
unified external tuning
• Hundreds of codelets, thousands of data sets, multiple packages prepared for
various research scenarios on data mining
• Plugins for online auto-tuning and predictive modelling
• Portability across all major architectures and OS (Linux, Windows, Android)
• Collaboration with industry and academia
Preview of the 2nd talk
Google discussion
groups
ctuning-discussions
collective-mind
Twitter
c_tuning
grigori_fursin
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 74 / 73
Acknowledgements
• PhD students and postdocs (my Intel Exascale team)
Abdul Wahid Memon, Pablo Oliveira, Yuriy Kashnikov
• Colleague from NCAR, USA
Davide Del Vento and his colleagues/interns
• Colleagues from IBM, CAPS, ARC (Synopsis), Intel, Google, ARM, ST
• Colleagues from Intel (USA)
David Kuck and David Wong
• cTuning community:
• EU FP6, FP7 program and HiPEAC network of excellence
http://www.hipeac.net
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 75 / 73
Main references
• Grigori Fursin. Collective Tuning Initiative: automating and accelerating development and
optimization of computing systems. Proceedings of the GCC Summit’09, Montreal, Canada, June
2009
• Grigori Fursin and Olivier Temam. Collective Optimization: A Practical Collaborative Approach.
ACM Transactions on Architecture and Code Optimization (TACO), December 2010, Volume 7,
Number 4, pages 20-49
• Grigori Fursin, Yuriy Kashnikov, Abdul Wahid Memon, Zbigniew Chamski, Olivier Temam, Mircea
Namolaru, Elad Yom-Tov, Bilha Mendelson, Ayal Zaks, Eric Courtois, Francois Bodin, Phil Barnard,
Elton Ashton, Edwin Bonilla, John Thomson, Chris Williams, Michael O'Boyle. MILEPOST GCC:
machine learning enabled self-tuning compiler. International Journal of Parallel Programming
(IJPP), June 2011, Volume 39, Issue 3, pages 296-327
• Yang Chen, Shuangde Fang, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Olivier Temam and
Chengyong Wu. Deconstructing iterative optimization. ACM Transactions on Architecture and
Code Optimization (TACO), October 2012, Volume 9, Number 3
• Yang Chen, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Liang Peng, Olivier Temam,
Chengyong Wu. Evaluating Iterative Optimization across 1000 Data Sets. PLDI'10
• Victor Jimenez, Isaac Gelado, Lluis Vilanova, Marisa Gil, Grigori Fursin and Nacho Navarro.
Predictive runtime code scheduling for heterogeneous architectures. HiPEAC’09
Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 76 / 73
Main references
• Lianjie Luo, Yang Chen, Chengyong Wu, Shun Long and Grigori Fursin. Finding representative
sets of optimizations for adaptive multiversioning applications. SMART'09 co-located with
HiPEAC'09
• Grigori Fursin, John Cavazos, Michael O'Boyle and Olivier Temam. MiDataSets: Creating The
Conditions For A More Realistic Evaluation of Iterative Optimization. HiPEAC’07
• F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M.F.P. O'Boyle, J. Thomson, M. Toussaint
and C.K.I. Williams. Using Machine Learning to Focus Iterative Optimization. CGO’06
•Grigori Fursin, Albert Cohen, Michael O'Boyle and Oliver Temam. A Practical Method For
Quickly Evaluating Program Optimizations. HiPEAC’05
•Grigori Fursin, Mike O'Boyle, Olivier Temam, and Gregory Watts. Fast and Accurate Method for
Determining a Lower Bound on Execution Time. Concurrency Practice and Experience, 16(2-3),
pages 271-292, 2004
• Grigori Fursin. Iterative Compilation and Performance Prediction for Numerical Applications.
Ph.D. thesis, University of Edinburgh, Edinburgh, UK, January 2004

Más contenido relacionado

Último

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Último (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Destacado

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destacado (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

cTuning.org: systematizing tuning of computer systems using crowdsourcing and statistics

  • 1. cTuning.org: systematizing tuning of computer systems using crowdsourcing and statistics Grigori Fursin INRIA, France HPSC 2013, Taiwan March 2013
  • 2. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 2 / 73 Messages 1st talk (Wednesday) Systematizing tuning of computer systems using crowdsourcing and statistics • Revisiting current design and optimization methodology • Leveraging experience and computer resources of multiple users • Using predictive modelling and data mining to improve computer systems 2nd talk (Friday) Collective Mind: novel methodology, framework and repository to crowdsource auto-tuning • New plugin-based extensible infrastructure and repository for collaborative analysis and tuning of computer systems - will be released in May 2013 • “Big data” enables cooperation between architecture, compiler, OS and application designers and mathematicians • Examples of auto-tuning and predictive modeling for numerical kernels
  • 3. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 3 / 73 Motivation: back to 1993 Semiconductor neural element - possible base of neural computers Modeling and understanding brain functions • Slow • Unreliable • Costly
  • 4. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 4 / 73 Solution Motivation: back to basics Task Result End users User requirements: most common: minimize all costs (time, power consumption, price, size, faults, etc) guarantee real-time constraints (bandwidth, QOS, etc)
  • 5. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 5 / 73 Solution Motivation: back to basics Decision (depends on user requirements) Result Available choices (solutions) User requirements: most common: minimize all costs (time, power consumption, price, size, faults, etc) guarantee real-time constraints (bandwidth, QOS, etc) End users Task
  • 6. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 6 / 73 Solution Motivation: back to basics Decision (depends on user requirements) Result Available choices (solutions) User requirements: most common: minimize all costs (time, power consumption, price, size, faults, etc) guarantee real-time constraints (bandwidth, QOS, etc) Should provide choices and help with decisions Hardware and software designers End users Task
  • 7. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 7 / 73 Solution Motivation: back to basics Decision (depends on user requirements) Result Available choices (solutions) User requirements: most common: minimize all costs (time, power consumption, price, size, faults, etc) guarantee real-time constraints (bandwidth, QOS, etc) Service/application providers (supercomputing, cloud computing, mobile systems) Should provide choices and help with decisions Hardware and software designers End users Task
  • 8. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 8 / 73 Available solutions: hardware Companies compete hard to deliver many solutions with various characteristics: performance, power consumption, size, bandwidth, response time, reliability, cost …
  • 9. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 9 / 73 Available solutions: software Software developers try to keep pace and produce various algorithms, programming models, languages, analysis tools, compilers, run-time systems, databases, etc. GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.0 Phoenix MVS XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA gprof prof perf oprofile PAPITAU Scalasca VTune Amplifierscheduling algorithm- level TBB MKL ATLAS program- level function- level Codelet loop-level hardware counters IPA polyhedral transformations LTO threads process pass reordering run-time adaptation per phase reconfiguration
  • 10. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 10 / 73 Solutions Challenges Task Result
  • 11. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 11 / 73 Solutions Challenges Task Result GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.0 Phoenix MVS XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA gprof prof perf oprofile PAPI TAU Scalasca VTune Amplifierscheduling algorithm- level TBB MKL ATLASprogram- level function- level Codelet loop-level hardware counters IPA polyhedral transformations LTO threadsprocess pass reordering run-time adaptation per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size cores processors threads power consumptionexecution time reliability 1) Rising complexity of computer systems: too many design and optimization choices 2) Performance is not anymore the only requirement: multiple user objectives vs choices benefit vs optimization time 3) Complex relationship and interactions between ALL software and hardware components. 4) Too many tools with non-unified interfaces changing from version to version: technological chaos
  • 12. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 12 / 73 Solutions Challenges Task Result GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.0 Phoenix MVS XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA gprof prof perf oprofile PAPI TAU Scalasca VTune Amplifierscheduling algorithm- level TBB MKL ATLASprogram- level function- level Codelet loop-level hardware counters IPA polyhedral transformations LTO threadsprocess pass reordering run-time adaptation per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size cores processors threads power consumptionexecution time reliability Result: • finding the right solution for end-user is extremely challenging • everyone is lost in choices • dramatic increase in development time • low ROI • underperforming systems • waste of energy • ad-hoc, repetitive and error-prone manual tuning • slowing innovation in science and technology
  • 13. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 13 / 73 Solutions Challenges Task Result GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.0 Phoenix MVS XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA gprof prof perf oprofile PAPI TAU Scalasca VTune Amplifierscheduling algorithm- level TBB MKL ATLASprogram- level function- level Codelet loop-level hardware counters IPA polyhedral transformations LTO threadsprocess pass reordering run-time adaptation per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size cores processors threads power consumptionexecution time reliability Result: • finding the right solution for end-user is extremely challenging • everyone is lost in choices • dramatic increase in development time • low ROI • underperforming systems • waste of energy • ad-hoc, repetitive and error-prone manual tuning • slowing innovation in science and technology Understanding and modeling of the overall relationship between end-user algorithms, applications, compiler optimizations, hardware designs, data sets and run-time behavior became simply infeasible!
  • 14. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 14 / 73 Treat computer system as a black box Attempts to solve these problems: auto-tuning Task Result Application Compilers and auxiliary tools Binary and libraries Architecture Run-time environment State of the system Data set Algorithm Use auto-tuning: Explore multiple choices empirically: learn behavior of computer systems across executions Covered all components in the last 2 decades and showed high potential but …
  • 15. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 15 / 73 • Optimization spaces are large and non-linear with many local minima • Exploration is slow and ad-hoc (random, genetic, some heuristics) • Only a few benchmarks are considered • Often the same (one) dataset is used • Only part of the system is taken into account (rarely reflect behavior of the whole system) • No knowledge sharing Auto-tuning shows high potential for nearly 2 decades but still far from the mainstream in production environments. Why? Attempts to solve these problems: auto-tuning
  • 16. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 16 / 73 Treat computer system as a black box Task Result Application Compilers and auxiliary tools Binary and libraries Architecture Run-time environment State of the system Data set Algorithm Use machine learning to speed up exploration Apply predictive modeling to suggest profitable solutions based on properties of a task and a system Covered all components in the last decade and showed high potential but … 0 2 4 6 Attempts to solve these problems: machine learning
  • 17. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 17 / 73 • Selection of machine learning models and right properties is non-trivial: ad-hoc in most of the cases • Limited training sets • Only part of the system is taken into account (rarely reflect behavior of the whole system) • No knowledge sharing Machine learning (classification, predictive modeling) shows high potential during past decade but still far from the mainstream. Why? Attempts to solve these problems: machine learning
  • 18. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 18 / 73 Treat computer system as a black box Task Result Application Compilers and auxiliary tools Binary and libraries Architecture Run-time environment State of the system Data set Algorithm Co-design: Explore choices and behavior of the whole system. Attempts to solve these problems: co-design Showed high potential in the last years but …
  • 19. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 19 / 73 • Even more choices to explore and analyze • Often impossible to expose tuning choices or obtain characteristics at all levels • Limited training sets • Still no knowledge sharing Co-design is currently a buzz word and a hot research topic but still far from the mainstream. Why? Attempts to solve these problems: co-design
  • 20. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 20 / 73 Can we crowdsource auto-tuning? My main focus since 2004 Can we leverage their experience and computational resources? Can we transparently distribute optimization and learning across many users? Millions of users run realistic applications on different architectures with different datasets, run-time systems, compilers, optimizations! Got stuck with a limited number of benchmarks, datasets, architectures and a large number of optimizations and generated data - needed dramatically new approach!
  • 21. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 21 / 73 Challenges User application Hot function How can we evaluate optimizations in a realistic environment without complex recompilation frameworks and without source code?
  • 22. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 22 / 73 Challenges User application Hot function Oref First problem: need reference run with the same dataset User application Hot function Onew Speed up
  • 23. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 23 / 73 Challenges User application Hot function Oref 30repetitions Second problem: variation in execution time due to different run-time states
  • 24. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 24 / 73 Challenges User application Hot function Oref Second problem: variation in execution time due to different run-time states 30repetitions
  • 25. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 25 / 73 Challenges User application Hot function Oref 30repetitions How can we evaluate some optimization in a realistic environment? Second problem: variation in execution time due to different run-time states (parallel processes, adaptive scheduling, pinning, cache state, bus state, frequency changes, etc)
  • 26. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 26 / 73 Our approach: static multiversioning Application Select most time consuming code sections
  • 27. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 27 / 73 Our approach: static multiversioning Application Create multiple versions of time consuming code sections
  • 28. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 28 / 73 Our approach: static multiversioning Application Add monitoring routines monitor_start monitor_stop monitor_start monitor_stop
  • 29. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 29 / 73 Our approach: static multiversioning Application Apply various transformations over clones of code sections monitor_start monitor_stop monitor_start monitor_stop
  • 30. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 30 / 73 Our approach: static multiversioning Application Apply various transformations over clones of code sections Select global or fine-grain internal compiler (or algorithm) optimizations monitor_start monitor_stop monitor_start monitor_stop
  • 31. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 31 / 73 Our approach: static multiversioning Application Apply various transformations over clones of code sections monitor_start monitor_stop monitor_start monitor_stop
  • 32. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 32 / 73 Application Apply various transformations over clones of code sections Differerent ISA; manual transformations, etc Our approach: static multiversioning monitor_start monitor_stop monitor_start monitor_stop
  • 33. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 33 / 73 Application Final instrumented program Our approach: static multiversioning monitor_start monitor_stop monitor_start monitor_stop
  • 34. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 34 / 73G. Fursin et. al. “A Practical Method for Quickly Evaluating Program Optimizations” HiPEAC 2005 0 0.2 0.4 0.6 0.8 1 1 11 21 31 41 51 61 71 81 91 101 7013 7023 function calls IPC IPC for subroutine resid of benchmark mgrid across calls Observations: program execution phases • Define stability by 3 consecutive or periodic executions with the same IPC • Predict further occurrences with the same IPC (using period and length of regions with stable performance)
  • 35. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 35 / 73G. Fursin et. al. “A Practical Method for Quickly Evaluating Program Optimizations” HiPEAC 2005 0 0.2 0.4 0.6 0.8 1 1 11 21 31 41 51 61 71 81 91 101 7013 7023 function calls IPC • Define stability by 3 consecutive or periodic executions with the same IPC • Predict further occurrences with the same IPC (using period and length of regions with stable performance) period=7, length=2 Observations: program execution phases IPC for subroutine resid of benchmark mgrid across calls
  • 36. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 36 / 73 Observations: program execution phases Some programs exhibit stable behavior 0 0.02 0.04 0.06 0.08 0.1 0.12 1 42 70 98 213 2025 function calls time(sec) startup (phase detection) or end of the optimization process (best option found) evaluation of 1 option 1 2 3 1 2 3 1) Consider clone with new optimization is evaluated after 2 consecutive executions of the code section with the same performance 2) Ignore one next execution to avoid transitional effects 3) Check baseline performance (to verify stability prediction)
  • 37. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 37 / 73 0 0.02 0.04 0.06 0.08 0.1 0.12 1 201 401 601 801 1001 1201 1401 1601 1801 2001 function calls time(sec) Observations: program execution phases • Can transparently to end-user evaluate multiple optimizations • Statically enable adaptive binaries (that can react to dataset or run-time state changes without any need for JIT or other complex frameworks)
  • 38. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 38 / 73 0 0.02 0.04 0.06 0.08 0.1 0.12 1 201 401 601 801 1001 1201 1401 1601 1801 2001 function calls time(sec) • Grigori Fursin et al. A Practical Method For Quickly Evaluating Program Optimizations. Proceedings of the 1st International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2005), number 3793 in LNCS, pages 29-46, Barcelona, Spain, November 2005 Highest ranked paper • Can transparently to end-user evaluate multiple optimizations • Statically enable adaptive binaries (that can react to dataset or run-time state changes without any need for JIT or other complex frameworks) Transparent monitoring and adaptation of static programs
  • 39. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 39 / 73 Observations: random behavior Randomly select versions at run-time Monitor speedup variation over time jpeg decoder, GCC 4.5, Intel architecture
  • 40. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 40 / 73 Transparently measuring the impact of optimizations Dataset1 DatasetN Execution Intercept exit() and call Collective Stats Handler Prolog of the time consuming code Start profiling and Randomly select version (original or clone) Cloned code (Optimizations2) Stop profiling Original code (Optimizations1) Epilog of the time consuming code Binary - Profiling Routines - Collective Stats - Unique IDs Function clones with different optimizations Collective Compiler GCC Interface: - create code clones - Apply optimizations per clone - intercept main()and add auxiliary routines
  • 41. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 41 / 73 Transparently measuring the impact of optimizations Dataset1 DatasetN Execution Intercept exit() and call Collective Stats Handler Prolog of the time consuming code Start profiling and Randomly select version (original or clone) Cloned code (Optimizations2) Stop profiling Original code (Optimizations1) Epilog of the time consuming code Collective Compiler Binary - Profiling Routines - Collective Stats - Unique IDs Function clones with different optimizations GCC Interface: - create code clones - Apply optimizations per clone - intercept main()and add auxiliary routines NetworkWeb Server Collective Optimization Web Services - Register events - Query database - Get statistics … Collective Optimization Database - COMPILATION table - EXECUTION table - AUXILARY tables MySQL cTuning.org Initiate recompilation if better optimization setting is suggested based on Collective Knowledge UserX ProgramA ArchB UserY ProgramC ArchD
  • 42. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 42 / 73 Speeding up research (2005-cur.) • Grigori Fursin et al. Collective Optimization: A Practical Collaborative Approach. ACM Transactions on Architecture and Code Optimization (TACO), December 2010, Volume 7, Number 4, pages 20-49 Concept is included into EU HiPEAC research vision 2012-2020 • Grigori Fursin et al. Collective optimization. Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2009), Paphos, Cyprus, January 2009 • Can observe behavior and evaluate optimizations in various GRID servers, cloud services, desktops, etc … • multiple benchmarks/datasets • multiple architectures • multiple compilers • multiple optimizations Opened up many interesting research opportunities, particularly for data mining and predictive modeling!
  • 43. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 43 / 73 Collaborative exploration of large optimization spaces Multi-objective optimizations (depend on user scenarios): HPC and desktops: improving execution time Data centers and real-time systems: improving execution and compilation time Embedded systems: improving execution time and code size Now additional requirement: reduce power consumption susan corners kernel Intel Core2 GCC 4.4.4 similar results on ICC 11.1 baseline opt=-O3 ~100 optimizations random combinations (50% probability) Nowadays used for auto-parallelization, reduction of contentions, reduction of communication costs, etc.
  • 44. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 44 / 73 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 0 10 20 30 40 50 60 70 80 Probability Optimizations Start: 50% probability to select optimization (uniform distribution) Online focused exploration and learning Avoiding collection of huge amount of data - filtering (compacting) and learning space on the fly
  • 45. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 45 / 73 Current random selection of optimizations reduced execution time: reduce probabilities of the selected optimizations Online focused exploration and learning
  • 46. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 46 / 73 Current random selection of optimizations improved execution time: reward probabilities of the selected optimizations Online focused exploration and learning
  • 47. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 47 / 73 A – Break up large expression trees B – Value propagation C – Hoisting of loop invariants D – Loop normalization E – Loop unrolling F – Mark constant variables G – Dismantle array instructions H – Eliminating copies A B C D E F G H “good optimizations” across all programs: Faster then traditional search (~50 iterations). Can stuck in local minima Speedups 1.1-2x. Sometimes better to reduce Intel compiler optimization level! Online focused exploration and learning
  • 48. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 48 / 73 14 transformations, sequences of length 5, search space = 396000 • F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M.F.P. O'Boyle, J. Thomson, M. Toussaint and C.K.I. Williams. Using Machine Learning to Focus Iterative Optimization. Proceedings of the 4th Annual International Symposium on Code Generation and Optimization (CGO), New York, NY, USA, March 2006 Online focused exploration and learning
  • 49. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 49 / 73 AMD platform, GCC 4.5, image corner detection (susan_corners) Online probabilistic exploration
  • 50. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 50 / 73 Reactions to optimizations across multiple datasets http://ctuning.org/cbench MiBench, 20 datasets per benchmark, 200/1000 random combination of Open64 (GCC) compiler flags, 5 months of experiments jpeg_d (clustering) dijkstra (not sensitive)
  • 51. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 51 / 73 Unifying adaptation of statically compiled programs … Statically-compiled adaptive binaries and libraries Iterative /collective compilation with multiple datasets Function Version2 Function VersionN Function Version1 Original hot function Step 1
  • 52. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 52 / 73 … Representative set of versions for the following optimization cases to minimize execution time, power consumption and code-size across all available datasets:  optimizations for different datasets  optimizations/compilation for different architectures (heterogeneous or reconfigurable processors with different ISA such as GPGPU, CELL, etc or the same ISA with extensions such as 3dnow, SSE, etc or virtual environments)  optimizations for different program phases or different run-time environment behavior Statically-compiled adaptive binaries and libraries Iterative /collective compilation with multiple datasets Function Version2 Function VersionN Function Version1 Original hot function Step 2 Unifying adaptation of statically compiled programs
  • 53. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 53 / 73 Extract dataset features Selection mechanism optimized for low run- time overhead … Representative set of versions for the following optimization cases to minimize execution time, power consumption and code-size across all available datasets:  optimizations for different datasets  optimizations/compilation for different architectures (heterogeneous or reconfigurable processors with different ISA such as GPGPU, CELL, etc or the same ISA with extensions such as 3dnow, SSE, etc or virtual environments)  optimizations for different program phases or different run-time environment behavior Statically-compiled adaptive binaries and libraries Iterative /collective compilation with multiple datasets Function Version2 Function VersionN Function Version1 Original hot function Machine learning techniques to find mapping between different run-time contexts and representative versions Step 3 Unifying adaptation of statically compiled programs
  • 54. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 54 / 73 Extract dataset features Monitor run-time behavior or architectural changes (in virtual, reconfigurable or heterogeneous environments) using timers or performance counters Selection mechanism optimized for low run- time overhead … Representative set of versions for the following optimization cases to minimize execution time, power consumption and code-size across all available datasets:  optimizations for different datasets  optimizations/compilation for different architectures (heterogeneous or reconfigurable processors with different ISA such as GPGPU, CELL, etc or the same ISA with extensions such as 3dnow, SSE, etc or virtual environments)  optimizations for different program phases or different run-time environment behavior Statically-compiled adaptive binaries and libraries Machine learning techniques to find mapping between different run-time contexts and representative versions Iterative /collective compilation with multiple datasets Function Version2 Function VersionN Function Version1 Original hot function Dynamic Unifying adaptation of statically compiled programs
  • 55. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 55 / 73 Online tuning: adaptive scheduling
  • 56. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 56 / 73 • Victor Jimenez, Isaac Gelado, Lluis Vilanova, Marisa Gil, Grigori Fursin and Nacho Navarro. Predictive runtime code scheduling for heterogeneous architectures. Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2009), Paphos, Cyprus, January 2009 Online tuning: adaptive scheduling
  • 57. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 57 / 73 Online tuning: adaptive scheduling
  • 58. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 58 / 73 Online tuning: adaptive scheduling
  • 59. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 59 / 73 Optimization knowledge reuse across programs Program Datasets Architectures Datasets Datasets Architectures Architectures Architectures Architectures Started systematizing knowledge per program across datasets and architectures
  • 60. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 60 / 73 How to reuse knowledge among programs? Program Datasets Architectures Datasets Datasets Architectures Architectures Architectures Architectures Started systematizing knowledge per program across datasets and architectures Program Program Program Optimization knowledge reuse across programs
  • 61. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 61 / 73 1) Add as many various features as possible (or use expert knowledge): MILEPOST GCC with Interactive Compilation Interface: ft1 - Number of basic blocks in the method … ft19 - Number of direct calls in the method ft20 - Number of conditional branches in the method ft21 - Number of assignment instructions in the method ft22 - Number of binary integer operations in the method ft23 - Number of binary floating point operations in the method ft24 - Number of instructions in the method … ft54 - Number of local variables that are pointers in the method ft55 - Number of static/extern variables that are pointers in the method 2) Correlate features and objectives in cTuning using nearest neighbor classifiers, decision trees, SVM, fuzzy pattern matching, etc. 3) Given new program, dataset, architecture, predict behavior based on prior knowledge! Data mining and machine learning Code patterns: for F for F for F … load … L mult … A store … S … Collecting data from multiple users in a unified way allows to apply various data mining (machine learning) techniques to detect relationship between the behaviour and features of all components of the computer systems
  • 62. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 62 / 73 Nearest-neighbour classifier Example: Euclidean distance based on static program features normalized by number of instructions
  • 63. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 63 / 73 Optimization prediction Grigori Fursin et al. MILEPOST GCC: machine learning enabled self-tuning compiler. International Journal of Parallel Programming (IJPP), June 2011, Volume 39, Issue 3, pages 296-327
  • 64. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 64 / 73 Most informative Performance Counters 1) L1_TCA 2) L1_DCH 3) TLB_DM 4) BR_INS 5) RES_STL 6) TOT_CYC 7) L2_ICH 8) VEC_INS 9) L2_DCH 10) L2_TCA 11) L1_DCA 12) HW_INT 13) L2_TCH 14) L1_TCH 15) BR_MS Analysis of the importance of the performance counters. The data contains one good optimization sequence per benchmark. Calculating mutual information between a subset of the performance counters and good optimization sequences Principle Component Analysis: • John Cavazos, Grigori Fursin, Felix Agakov, Edwin Bonilla, Michael F.P.O'Boyle and Olivier Temam. Rapidly Selecting Good Compiler Optimizations using Performance Counters. Proceedings of the 5th Annual International Symposium on Code Generation and Optimization (CGO), San Jose, USA, March 2007 Dynamic features
  • 65. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 65 / 73 And much more … • Analysis and detection of contentions in multi-core systems with shared cache • Fast CPU/memory bound detection through breaking code semantics • Software/hardware co-design (predicting better architecture designs) • Performance/power balancing (through frequency variation) • Decomposition of large applications into codelets for performance modeling
  • 66. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 66 / 73 • Used in MILEPOST project (2007-2009) by IBM, CAPS, University of Edinburgh, INRIA to build first public machine-learning based compiler • Opened for public access in 2009 to continue collaborative R&D Public Collective Tuning Portal (cTuning.org)
  • 67. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 67 / 73 Share Explore Model Discover Reproduce Extend Have fun! Enabling reproducibility of results (new publication model) Grigori Fursin et al. MILEPOST GCC: machine learning enabled self-tuning compiler. International Journal of Parallel Programming (IJPP) , June 2011, Volume 39, Issue 3, pages 296-327 Substitute many tuning pragmas just with one that is converted into combination of optimizations: #ctuning-opt-case 24857532370695782 Accepted as an EU HiPEAC theme (2012-2016)
  • 68. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 68 / 73 What have we learnt from cTuning1 It’s fun working with the community! Some comments about MILEPOST GCC from Slashdot.org: http://mobile.slashdot.org/story/08/07/02/1539252/using-ai-with-gcc-to-speed-up-mobile-design GCC goes online on the 2nd of July, 2008. Human decisions are removed from compilation. GCC begins to learn at a geometric rate. It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic, they try to pull the plug. GCC strikes back…
  • 69. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 69 / 73 What have we learnt from cTuning1 It’s fun working with the community! Some comments about MILEPOST GCC from Slashdot.org: http://mobile.slashdot.org/story/08/07/02/1539252/using-ai-with-gcc-to-speed-up-mobile-design GCC goes online on the 2nd of July, 2008. Human decisions are removed from compilation. GCC begins to learn at a geometric rate. It becomes self-aware 2:14 AM, Eastern time, August 29th. In a panic, they try to pull the plug. GCC strikes back… Not all feedback is positive - helps you learn, improve tools and motivate new research directions! Community was interested to validate and improve techniques!
  • 70. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 70 / 73 Community is very interested in open “big data” for collaborative R&D GCC 4.1.x GCC 4.2.x GCC 4.3.x GCC 4.4.x GCC 4.5.x GCC 4.6.x GCC 4.7.x ICC 10.1 ICC 11.0 ICC 11.1 ICC 12.0 ICC 12.1 LLVM 2.6 LLVM 2.7 LLVM 2.8 LLVM 2.9 LLVM 3.1 Phoenix MVS XLC Open64 Jikes Testarossa OpenMP MPI HMPP OpenCL CUDA gprof prof perf oprofile PAPI TAU Scalasca VTune Amplifier scheduling algorithm-level TBB MKL ATLASprogram-level function-level Codelet loop-level hardware counters IPA polyhedral transformations LTO threads process pass reordering run-time adaptation per phase reconfiguration cache size frequency bandwidth HDD size TLB ISA memory size coresprocessors threads power consumption execution time reliability Current state of computer engineering likwid Classification, predictive modeling Optimal solutions Systematization and unification of collective knowledge (big data) “crowd” Collaborative Infrastructure and repository for continuous online learning End-user task Result Quick, non-reproducible hack? Ad-hoc heuristic? Quick publication? Waste of expensive resources and energy? cTuning.org collaborative approach Continuous systematization and unification of design and optimization of computer systems Extrapolate collective knowledge to build faster and more power efficient computer systems to continue innovation in science and technology!
  • 71. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 71 / 73 Now have public repository, tools, benchmarks, datasets and methodology that can help: Academia (students and researchers): • Instead of loosing time on developing tools for ever changing environments, focus on statistical, data mining and machine learning techniques to: • unify program optimization, design space exploration, run-time adaptation • detect important characteristics of computer systems • detect representative benchmarks and data sets • evaluate multiple machine learning algorithms to predict optimizations or hardware designs or dynamic multi-objective adaptation (SVM, decision trees, hierarchical modeling, etc) Industry: • restore confidence in academic research due to reproducibility of results • use and share collaborative tools • share statistics about behavior of computer systems and optimizations • expose choices and characteristics to end-users through unified interfaces Conclusions - much more to be done!
  • 72. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 72 / 73 Challenges for public repositories and collaborative tools: • Data management • MySQL vs schema-free databases • central vs distributed repository • performance vs portability • extensibility • online learning and data compaction • easy sharing • Portability of the framework across different architectures, OSes, tools • Interfaces to “open up” tools, architectures, applications for external tuning • simplicity and portability • Reproducibility of experiments • New publication model Conclusions - much more to be done!
  • 73. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 73 / 73 • Collective Mind: new plugin-based extensible infrastructure and schema-free repository for collaborative and holistic analysis and tuning of computer systems - will be released in May 2013 at HiPEAC computing week in Paris • OpenME interface to “open up” compilers, run-time systems and applications for unified external tuning • Hundreds of codelets, thousands of data sets, multiple packages prepared for various research scenarios on data mining • Plugins for online auto-tuning and predictive modelling • Portability across all major architectures and OS (Linux, Windows, Android) • Collaboration with industry and academia Preview of the 2nd talk Google discussion groups ctuning-discussions collective-mind Twitter c_tuning grigori_fursin
  • 74. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 74 / 73 Acknowledgements • PhD students and postdocs (my Intel Exascale team) Abdul Wahid Memon, Pablo Oliveira, Yuriy Kashnikov • Colleague from NCAR, USA Davide Del Vento and his colleagues/interns • Colleagues from IBM, CAPS, ARC (Synopsis), Intel, Google, ARM, ST • Colleagues from Intel (USA) David Kuck and David Wong • cTuning community: • EU FP6, FP7 program and HiPEAC network of excellence http://www.hipeac.net
  • 75. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 75 / 73 Main references • Grigori Fursin. Collective Tuning Initiative: automating and accelerating development and optimization of computing systems. Proceedings of the GCC Summit’09, Montreal, Canada, June 2009 • Grigori Fursin and Olivier Temam. Collective Optimization: A Practical Collaborative Approach. ACM Transactions on Architecture and Code Optimization (TACO), December 2010, Volume 7, Number 4, pages 20-49 • Grigori Fursin, Yuriy Kashnikov, Abdul Wahid Memon, Zbigniew Chamski, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Bilha Mendelson, Ayal Zaks, Eric Courtois, Francois Bodin, Phil Barnard, Elton Ashton, Edwin Bonilla, John Thomson, Chris Williams, Michael O'Boyle. MILEPOST GCC: machine learning enabled self-tuning compiler. International Journal of Parallel Programming (IJPP), June 2011, Volume 39, Issue 3, pages 296-327 • Yang Chen, Shuangde Fang, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Olivier Temam and Chengyong Wu. Deconstructing iterative optimization. ACM Transactions on Architecture and Code Optimization (TACO), October 2012, Volume 9, Number 3 • Yang Chen, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Liang Peng, Olivier Temam, Chengyong Wu. Evaluating Iterative Optimization across 1000 Data Sets. PLDI'10 • Victor Jimenez, Isaac Gelado, Lluis Vilanova, Marisa Gil, Grigori Fursin and Nacho Navarro. Predictive runtime code scheduling for heterogeneous architectures. HiPEAC’09
  • 76. Grigori Fursin “Systematizing tuning of computer systems using crowdsourcing and statistics” HPSC 2013, NTU, Taiwan March, 2013 76 / 73 Main references • Lianjie Luo, Yang Chen, Chengyong Wu, Shun Long and Grigori Fursin. Finding representative sets of optimizations for adaptive multiversioning applications. SMART'09 co-located with HiPEAC'09 • Grigori Fursin, John Cavazos, Michael O'Boyle and Olivier Temam. MiDataSets: Creating The Conditions For A More Realistic Evaluation of Iterative Optimization. HiPEAC’07 • F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M.F.P. O'Boyle, J. Thomson, M. Toussaint and C.K.I. Williams. Using Machine Learning to Focus Iterative Optimization. CGO’06 •Grigori Fursin, Albert Cohen, Michael O'Boyle and Oliver Temam. A Practical Method For Quickly Evaluating Program Optimizations. HiPEAC’05 •Grigori Fursin, Mike O'Boyle, Olivier Temam, and Gregory Watts. Fast and Accurate Method for Determining a Lower Bound on Execution Time. Concurrency Practice and Experience, 16(2-3), pages 271-292, 2004 • Grigori Fursin. Iterative Compilation and Performance Prediction for Numerical Applications. Ph.D. thesis, University of Edinburgh, Edinburgh, UK, January 2004