SlideShare a Scribd company logo
1 of 33
Download to read offline
GECCO 2013 GPUs for GEC
GECCO 2013 GPUs for Genetic and
Evolutionary Computation Competition
Daniele Loiacono and Antonino Tumeo
GECCO 2013 GPUs for GEC
Why GPUs?
!  The GPU has evolved into a very flexible and powerful processor:
" It’s programmable using high-level languages
" Now supports 32-bit and 64-bit floating point IEEE-754 precision
" It offers lots of GFLOPS
!  GPU in every PC and workstation
GECCO 2013 GPUs for GEC
!  Goal
" Attract the applications of genetic and evolutionary
computation that can maximally exploit the parallelism
provided by low-cost consumer graphical cards.
!  Evaluation
" 50% – Quality and Performance
" 30% - Relevance for EC community
" 20% – Novelty
!  Panel
… and myself
This competition…
Simon Harding El-Ghazali TalbiAntonino TumeoJaume Bacardit
GECCO 2013 GPUs for GEC
Entries
“Fast QAP Solver with ACO and Taboo Search on Multiple GPUs
with the Move-Cost Adjusted Thread Assignment”.
Shigeyoshi Tsutsui and Noriyuki Fujimoto
“GPOCL: A Massively Parallel GP Implementation in OpenCL”
Douglas A. Augusto Helio J.C. Barbosa
GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary%
Fast QAP Solver with ACO and Taboo
Search on Multiple GPUs with the Move-
Cost Adjusted Thread Assignment
GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary%
Quadratic Assignment Problem (QAP)
•  One of the hardest combinatorial optimization
problem
–  There are many real-world applications:
•  Optimum location allocation of factories in a multinational company
•  Optimum section allocation in a big building
•  …
•  Definition:
–  Given n locations and n facilities, the task is to assign the
facilities to the locations to minimize the cost
•  aij is the distance matrix for each pair of locations i and j
•  bij is the flow matrix for each pair of facilities i and j
∑∑= =
=
n
i
ji
n
j
ijbaf
1
)()(
1
)( φφφ
GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary%
Initialize Pheromone density τ"
Update pheromone density τ"
Construct solutions based on τ"
Apply local search (Tabu search)
start
end
ACO+TS on a Single GPU
Pheromone
Density
Matrix
τ"
Initialize Pheromone density τ"
Construct solutions based on τ"
Apply local search (Tabu search)
Update pheromone density τ"
Terminate?Terminate?
Instances
Construction
of solusions
TS
Updating
Trail
tai40a 0.007% 99.992% 0.001%
tai50a 0.005% 99.994% 0.000%
tai60a 0.004% 99.996% 0.000%
tai80a 0.002% 99.997% 0.000%
tai100a 0.002% 99.998% 0.000%
tai50b 0.022% 99.976% 0.002%
tai60b 0.017% 99.982% 0.001%
tai80b 0.011% 99.988% 0.001%
tai100b 0.008% 99.991% 0.000%
tai150b 0.005% 99.995% 0.000%
Time distribution in sequential run on CPU
•  We combined ACO and Taboo Search (TS)
GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary%
•  A neighbor φ ’ of φ in QAP
•  Neighborhood size of N(φ) is Nsize=n(n-1)/2
•  To choose the best φ’, we need to calculate
costs for all of Nsize neighbors
2 1 0 3φ
Neighborhood in the QAP
0 1 2 3φ’
swap
GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary%
Computation Cost of a Neighboring Solution
•  Fast update [Taillard 04]:
–  if we have memory of Δ(φ, r, s) for all pairs r, s,
–  and {u, v} ∩ {r, s}= satisfies, Δ(φ’, u, v) can be obtained as:
∑
−
≠=
$
$
%
&
'
'
(
)
−+−
+−+−
+−+−
+−+−=Δ
1
,,0 )()()()()()()()(
)()()()()()()()(
)()()()()()()()(
)()()()()()()()(
)()(
)()(
)()(
)()(),,(
n
srkk kskrskkrksrk
skrkksrkskkr
ssrrssrssrsr
srrsrsrrssrr
bbabba
bbabba
bbabba
bbabbasr
φφφφφφφφ
φφφφφφφφ
φφφφφφφφ
φφφφφφφφφ
)(nO
))((
))((
),,(),,'(
)(')(')(')(')(')(')(')('
)(')(')(')(')(')(')(')('
rurvsvsuusvsvrur
urvrvsussusvrvru
bbbbaaaa
bbbbaaaa
vuvu
φφφφφφφφ
φφφφφφφφ
φφ
−+−−+−
+−+−−+−
+Δ=Δ
)1(O
•  Let φ’ be a neighbor of φ obtained by exchanging r-th and s-th elements of φ,
then move cost Δ(φ, r, s)=f(φ’) - f(φ) can be obtained as
GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary%
Parallel computation of move cost
-The simplest threads assignment-
threadIdx.x=0
threadIdx.x=1
threadIdx.x=2
.
.
.
.
.
threadIdx.x=Nsize-1
blockIdx.x=0
Assign m agents to blocks
Assignmovecalculationstothreads
blockIdx.x=1 blockIdx.x= m-1
Nsize=n(n-1)/2
threadIdx.x=0
threadIdx.x=1
threadIdx.x=2
.
.
.
.
.
threadIdx.x=Nsize-1
threadIdx.x=0
threadIdx.x=1
threadIdx.x=2
.
.
.
.
.
threadIdx.x=Nsize-1
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0
1 0
2 1 2
3 3 4 5
4 6 7 8 9
5 10 11 12 13 14
6 15 16 17 18 19 20
7 21 22 23 24 25 26 27
8 28 29 30 31 32 33 34 35
9 36 37 38 39 40 41 42 43 44
10 45 46 47 48 49 50 51 52 53 54
11 55 56 57 58 59 60 61 62 63 64 65
12 66 67 68 69 70 71 72 73 74 75 76 77
13 78 79 80 81 82 83 84 85 86 87 88 89 90
u
v
GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary%
Move-Cost Adjusted Thread Assignment (MATA)
Computational time
warp 0
warp 1
0
1
2
3
15
16
30
31
32
33
3232
thread index
Computational time
No branch
divergence
in each warp !
0
1
2
3
4
5
6
28
29
30
31
warp 0
thread index
32
33
34
35
36
37
38
60
61
62
63
warp 1
O(1) O(n)
Delay Caused by Branch Divergence
GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary%
Speedup on a Single GTX480
tai50a
tai60a
tai80a
tai100a
tai50b
tai60b
tai80b
tai100b
Average
0
5
10
15
20
25
30
35
40
Speedup
3.7
26.1
3.4
27.7
4.3
20.3
3.4
18.3
3.9
24.9
4.6
35.5
4.2
21.4
5.4
29.5
4.1
25.5
CPU: i7 965, 3.2GHz
QAP Instances
GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary%
Implementation on Multiple GPUs
CPU
ACO0
ACO2
ACO1ACO3
CPU
work
memory
: solutions
GPU0
GPU1
GPU2
GPU3
ACO3
ACO0
ACO1
GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary%
4 Types of Island Models
•  We implemented following 4 types of
island models
1.  IM-INDP: Island model with independent
runs
2.  IM-ELIT: Island model with elitist
3.  IM-RING: Island model with ring connected
4.  IM-ELMR: Island model with elitist and
massive ring connected
GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary%
IM-INDP:
Island model with independent runs
CPU
ACO0
ACO1
ACO3
ACO2
GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary%
IM-ELIT:
Island model with elitist
worst guy
best guy
global best guy
ACO0
ACO1
ACO3
ACO2
GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary%
IM-RING:
Island model with ring connected
worst guy
best guy
ACO1
ACO2
ACO3
ACO0
GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary%
IM-ELMR: Island model with elitist
and massive ring connected
CPU
IM-ELIT +
ACO1
ACO2
ACO3
ACO0
GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary%
Results of Island Models with 4
GPUstai50a
tai60a
tai80a
tai100a
tai50b
tai60b
tai80b
tai100b
Average
0
1
2
3
4
5
6
7
Speedup
2.1
2.6
2.9
3.3
1.9
2.22.3
2.5 2.42.5
2.7
2.9
1.7
2.12.2
2.5
1.5
2.3
2.5
3
1.2
1.4
1.9
2.6
1.5
4.7
4.3
6.5
1.4
2.32.3
3.2
1.7
2.52.6
3.3
GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary%
Conclusion
•  On a single GPU with “MATA”
– 25.5 times speedup to CPU (i7 965, 3.2GHz)
•  On 4-GPU (GTX480)
– IM_ELMR model has 3.3 times speedup to
single GPU
•  As a result, 25.5×3.3 = 84.2 times speedup
compared with the CPU computation
GPOCL:
A Massively Parallel GP
Implementation in OpenCL
Douglas A. Augusto Helio J.C. Barbosa
douglas@lncc.br hcbm@lncc.br
Laborat´orio Nacional de Computa¸c˜ao Cient´ıfica (LNCC)
Rio de Janeiro, Brazil
GPOCL’s Features
2 / 12
n Fast and e cient C/C++ implementation based on a compact
linear tree representation.
n Massively parallel tree interpretation using OpenCL.
n It can be executed on virtually any parallel device, comprising dif-
ferent architectures and vendors.
n It implements three di↵erent parallel strategies (fitness-based,
population-based, and a mixture of both).
n To improve diversity it can evolve loosely-coupled subpopulations
(neighborhoods).
n It has a rich set of command-line options, including primitives’ set
definition, probabilities of the genetic operators, stopping crite-
ria, minimum and maximum tree sizes, and the configuration of
neighborhoods.
n It is Free Software (http://gpocl.sf.net).
Open Computing Language (OpenCL)
3 / 12
n Open Computing Language, or simply OpenCL, is an open-
standard programming language for heterogeneous parallel com-
puting.1
n It aims at e ciently exploiting the computing power of all process-
ing devices, such as traditional processors (CPU) and accelerators
(GPU, FPGA, DSP, Intel’s MIC, and so forth).
n It provides a uniform programming interface, which saves the pro-
grammer from writing di↵erent codes in di↵erent languages when
targeting multiple compute architectures, thus providing portabil-
ity.
n It is very flexible (low-level language).
1
http://www.khronos.org
GPOCL
4 / 12
GPOCL implements a GP system using a prefix linear tree represen-
tation. Its main routine performs the following high-level procedures:
1. OpenCL initialization: This is the step where the general
OpenCL-related tasks are initialized.
2. Calculating n-dimensional ranges: Defines how much paral-
lel work there will be and how they are distributed among the
compute units.
3. Memory bu↵ers creation: In this phase all global memory re-
gions accessed by the OpenCL kernels are allocated on the device
and possibly initialized. The fitness cases are transferred and
enough space is reserved for the population and error vectors.
4. Kernel building: An OpenCL kernel, relative to a given strategy
of parallelization, is compiled just-in-time, targeting the compute
device.
5. Evolving: This iterative routine implements the actual genetic
programming dynamics.
Main Evolutionary Algorithm
5 / 12
Create (randomly) the initial population P;
22 Evaluate(P);
for generation 1 to NG do
Copy the best (elitism) programs of P to the temporary population Ptmp;
while |Ptmp| < |P| do
Select and copy from P two fit programs, p1 and p2;
if [probabilistically] crossover then
Recombine p1 and p2, generating p0
1 and p0
2;
p1 p0
1; p2 p0
2;
end
if [probabilistically] mutation then
Apply mutation in p1 and p2, creating p0
1 and p0
2;
p1 p0
1; p2 p0
2;
end
Insert p1 and p2 into Ptmp;
end
P Ptmp; then reset Ptmp;
1818 Evaluate(P);
end
return the best program found;
Evaluate(P)
6 / 12
The evaluation step itself does not do much—the hard work is done
mostly by the OpenCL kernels. Basically, three things happen within
Evaluate(P):
1. Population transfer: All programs of P are transferred to the
target compute device.
2. Kernel execution: For any non-trivial problem, this is the most
demanding phase. Here, the entire recently transferred popula-
tion is evaluated—by interpreting each program over each fitness
case—on the compute device. Fortunately, this step can be done
both in parallel as well accelerated by GPUs.
3. Error retrieval: After being computed and accumulated in the
previous step, the population’s prediction errors need to be trans-
ferred to the host so that this information is available to the
evolutionary process.
Overall Best Parallelization Strategy
7 / 12
n The population of programs and fitness cases are parallelized.
n A mixture of the fitness- and population-based strategies.
n While di↵erent programs are evaluated simultaneously on di↵erent
compute units (CU), the processing elements (PE) within each CU
take care, in parallel, of the whole training data set.
n Since internally to each CU the PEs will be interpreting the same
program, the event of instruction divergence is unlikely.
Some benchmarks on a NVIDIA
GTX-285 GPU
An old generation GPU (released in early 2009)
8 / 12
Fitness-based Parallelization Strategy
9 / 12
100
1000
5000
10000
25000
50000
1000
5000
10000
25000
50000
1.000
2.000
3.000
4.000
5.000
6.000
7.000
8.000
9.000
10.000
BillionGPop/s
Population size Data set size
BillionGPop/s
9.540 Billion GPop/s
(good performance, but requires a lot of fitness cases)
Population-based Parallelization Strategy
10 / 12
100
1000
5000
10000
25000
50000
1000
5000
10000
25000
50000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
BillionGPop/s
Population size Data set size
BillionGPop/s
0.690 Billion GPop/s
(bad performance, causes a lot of instruction divergence)
Combined Fitness- and Population-based
Parallelization Strategy
11 / 12
100
1000
5000
10000
25000
50000
1000
5000
10000
25000
50000
7.000
8.000
9.000
10.000
11.000
12.000
BillionGPop/s
Population size Data set size
BillionGPop/s
11.85 Billion GPop/s
12 / 12
Thank you!
GECCO 2013 GPUs for GEC
Shigeyoshi Tsutsui, Hannan University
and
Noriyuki Fujimoto, Osaka Prefecture University
Fast QAP Solver with ACO and Taboo Search on Multiple
GPUs with the Move-Cost Adjusted Thread Assignment
And the winner is....

More Related Content

What's hot

GPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMPGPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMPMiller Lee
 
A Future for R: Parallel and Distributed Processing in R for Everyone
A Future for R: Parallel and Distributed Processing in R for EveryoneA Future for R: Parallel and Distributed Processing in R for Everyone
A Future for R: Parallel and Distributed Processing in R for Everyoneinside-BigData.com
 
SICP勉強会について
SICP勉強会についてSICP勉強会について
SICP勉強会についてYusuke Sasaki
 
TC74LCX244FW PSpice Model (Free SPICE Model)
TC74LCX244FW PSpice Model (Free SPICE Model)TC74LCX244FW PSpice Model (Free SPICE Model)
TC74LCX244FW PSpice Model (Free SPICE Model)Tsuyoshi Horigome
 
TC74LCX244FT PSpice Model (Free SPICE Model)
TC74LCX244FT PSpice Model (Free SPICE Model)TC74LCX244FT PSpice Model (Free SPICE Model)
TC74LCX244FT PSpice Model (Free SPICE Model)Tsuyoshi Horigome
 
TC74LCX244F PSpice Model (Free SPICE Model)
TC74LCX244F PSpice Model (Free SPICE Model)TC74LCX244F PSpice Model (Free SPICE Model)
TC74LCX244F PSpice Model (Free SPICE Model)Tsuyoshi Horigome
 
Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux F...
Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux F...Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux F...
Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux F...InfluxData
 
C++ Corehard Autumn 2018. Обучаем на Python, применяем на C++ - Павел Филонов
C++ Corehard Autumn 2018. Обучаем на Python, применяем на C++ - Павел ФилоновC++ Corehard Autumn 2018. Обучаем на Python, применяем на C++ - Павел Филонов
C++ Corehard Autumn 2018. Обучаем на Python, применяем на C++ - Павел Филоновcorehard_by
 
Optimizing the Performance of an Unpredictable UAV Swarm for Intruder Detection
Optimizing the Performance of an Unpredictable UAV Swarm for Intruder DetectionOptimizing the Performance of an Unpredictable UAV Swarm for Intruder Detection
Optimizing the Performance of an Unpredictable UAV Swarm for Intruder DetectionDaniel H. Stolfi
 
Vc4c development of opencl compiler for videocore4
Vc4c  development of opencl compiler for videocore4Vc4c  development of opencl compiler for videocore4
Vc4c development of opencl compiler for videocore4nomaddo
 
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATORQGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATORNVIDIA Japan
 
A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...
A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...
A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...Daniel H. Stolfi
 
Performance Portability Through Descriptive Parallelism
Performance Portability Through Descriptive ParallelismPerformance Portability Through Descriptive Parallelism
Performance Portability Through Descriptive ParallelismJeff Larkin
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source codeAndrey Karpov
 
Home Automation with Android Things and the Google Assistant
Home Automation with Android Things and the Google AssistantHome Automation with Android Things and the Google Assistant
Home Automation with Android Things and the Google AssistantNilhcem
 
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014PyData
 
General Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareGeneral Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareDaniel Blezek
 
Config interface
Config interfaceConfig interface
Config interfaceRyan Boland
 

What's hot (20)

GPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMPGPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMP
 
A Future for R: Parallel and Distributed Processing in R for Everyone
A Future for R: Parallel and Distributed Processing in R for EveryoneA Future for R: Parallel and Distributed Processing in R for Everyone
A Future for R: Parallel and Distributed Processing in R for Everyone
 
SICP勉強会について
SICP勉強会についてSICP勉強会について
SICP勉強会について
 
Thesis Final Presentation
Thesis Final PresentationThesis Final Presentation
Thesis Final Presentation
 
TC74LCX244FW PSpice Model (Free SPICE Model)
TC74LCX244FW PSpice Model (Free SPICE Model)TC74LCX244FW PSpice Model (Free SPICE Model)
TC74LCX244FW PSpice Model (Free SPICE Model)
 
TC74LCX244FT PSpice Model (Free SPICE Model)
TC74LCX244FT PSpice Model (Free SPICE Model)TC74LCX244FT PSpice Model (Free SPICE Model)
TC74LCX244FT PSpice Model (Free SPICE Model)
 
TC74LCX244F PSpice Model (Free SPICE Model)
TC74LCX244F PSpice Model (Free SPICE Model)TC74LCX244F PSpice Model (Free SPICE Model)
TC74LCX244F PSpice Model (Free SPICE Model)
 
Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux F...
Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux F...Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux F...
Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux F...
 
C++ Corehard Autumn 2018. Обучаем на Python, применяем на C++ - Павел Филонов
C++ Corehard Autumn 2018. Обучаем на Python, применяем на C++ - Павел ФилоновC++ Corehard Autumn 2018. Обучаем на Python, применяем на C++ - Павел Филонов
C++ Corehard Autumn 2018. Обучаем на Python, применяем на C++ - Павел Филонов
 
Optimizing the Performance of an Unpredictable UAV Swarm for Intruder Detection
Optimizing the Performance of an Unpredictable UAV Swarm for Intruder DetectionOptimizing the Performance of an Unpredictable UAV Swarm for Intruder Detection
Optimizing the Performance of an Unpredictable UAV Swarm for Intruder Detection
 
Vc4c development of opencl compiler for videocore4
Vc4c  development of opencl compiler for videocore4Vc4c  development of opencl compiler for videocore4
Vc4c development of opencl compiler for videocore4
 
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATORQGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
 
A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...
A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...
A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...
 
Performance Portability Through Descriptive Parallelism
Performance Portability Through Descriptive ParallelismPerformance Portability Through Descriptive Parallelism
Performance Portability Through Descriptive Parallelism
 
Static analysis of C++ source code
Static analysis of C++ source codeStatic analysis of C++ source code
Static analysis of C++ source code
 
Home Automation with Android Things and the Google Assistant
Home Automation with Android Things and the Google AssistantHome Automation with Android Things and the Google Assistant
Home Automation with Android Things and the Google Assistant
 
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
 
General Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareGeneral Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics Hardware
 
Config interface
Config interfaceConfig interface
Config interface
 
Gpus graal
Gpus graalGpus graal
Gpus graal
 

Similar to GPUs for GEC Competition @ GECCO-2013

Nvidia in bioinformatics
Nvidia in bioinformaticsNvidia in bioinformatics
Nvidia in bioinformaticsShanker Trivedi
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGATO project
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
 
byteLAKE's Alveo FPGA Solutions
byteLAKE's Alveo FPGA SolutionsbyteLAKE's Alveo FPGA Solutions
byteLAKE's Alveo FPGA SolutionsbyteLAKE
 
An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...NECST Lab @ Politecnico di Milano
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsIgor Sfiligoi
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCinside-BigData.com
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUsSri Ambati
 
E0364025031
E0364025031E0364025031
E0364025031theijes
 
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Storti Mario
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
Scalability for All: Unreal Engine* 4 with Intel
Scalability for All: Unreal Engine* 4 with Intel Scalability for All: Unreal Engine* 4 with Intel
Scalability for All: Unreal Engine* 4 with Intel Intel® Software
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...Bomm Kim
 
Fugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons LearnedFugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons LearnedRCCSRENKEI
 
Solving channel coding simulation and optimization problems using GPU
Solving channel coding simulation and optimization problems using GPUSolving channel coding simulation and optimization problems using GPU
Solving channel coding simulation and optimization problems using GPUUsatyuk Vasiliy
 

Similar to GPUs for GEC Competition @ GECCO-2013 (20)

Nvidia in bioinformatics
Nvidia in bioinformaticsNvidia in bioinformatics
Nvidia in bioinformatics
 
An35225228
An35225228An35225228
An35225228
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
byteLAKE's Alveo FPGA Solutions
byteLAKE's Alveo FPGA SolutionsbyteLAKE's Alveo FPGA Solutions
byteLAKE's Alveo FPGA Solutions
 
FIR filter on GPU
FIR filter on GPUFIR filter on GPU
FIR filter on GPU
 
An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...
 
vega
vegavega
vega
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
E0364025031
E0364025031E0364025031
E0364025031
 
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
Scalability for All: Unreal Engine* 4 with Intel
Scalability for All: Unreal Engine* 4 with Intel Scalability for All: Unreal Engine* 4 with Intel
Scalability for All: Unreal Engine* 4 with Intel
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Fugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons LearnedFugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons Learned
 
Solving channel coding simulation and optimization problems using GPU
Solving channel coding simulation and optimization problems using GPUSolving channel coding simulation and optimization problems using GPU
Solving channel coding simulation and optimization problems using GPU
 
6. Implementation
6. Implementation6. Implementation
6. Implementation
 

More from Daniele Loiacono

2013 Simulated Car Racing @ GECCO-2013
2013 Simulated Car Racing @ GECCO-20132013 Simulated Car Racing @ GECCO-2013
2013 Simulated Car Racing @ GECCO-2013Daniele Loiacono
 
2012 Simulated Car Racing Championship @ CIG-2012
2012 Simulated Car Racing Championship @ CIG-20122012 Simulated Car Racing Championship @ CIG-2012
2012 Simulated Car Racing Championship @ CIG-2012Daniele Loiacono
 
2012 Simulated Car Racing Championship @ GECCO-2012
2012 Simulated Car Racing Championship @ GECCO-20122012 Simulated Car Racing Championship @ GECCO-2012
2012 Simulated Car Racing Championship @ GECCO-2012Daniele Loiacono
 
2012 Simulated Car Racing Championship @ Evo*-2012
2012 Simulated Car Racing Championship @ Evo*-20122012 Simulated Car Racing Championship @ Evo*-2012
2012 Simulated Car Racing Championship @ Evo*-2012Daniele Loiacono
 
Computational Intelligence in Games Tutorial @GECCO2012
Computational Intelligence in Games Tutorial @GECCO2012Computational Intelligence in Games Tutorial @GECCO2012
Computational Intelligence in Games Tutorial @GECCO2012Daniele Loiacono
 
XCSF with Local Deletion: Preventing Detrimental Forgetting
XCSF with Local Deletion: Preventing Detrimental ForgettingXCSF with Local Deletion: Preventing Detrimental Forgetting
XCSF with Local Deletion: Preventing Detrimental ForgettingDaniele Loiacono
 
Testing learning classifier systems
Testing learning classifier systemsTesting learning classifier systems
Testing learning classifier systemsDaniele Loiacono
 
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...Daniele Loiacono
 
Introducing LCS to Digital Design Verification
Introducing LCS to Digital Design VerificationIntroducing LCS to Digital Design Verification
Introducing LCS to Digital Design VerificationDaniele Loiacono
 
A temporal classifier system using spiking neural networks
A temporal classifier system using spiking neural networksA temporal classifier system using spiking neural networks
A temporal classifier system using spiking neural networksDaniele Loiacono
 
Confusion Matrices for Improving Performance of Feature Pattern Classifier Sy...
Confusion Matrices for Improving Performance of Feature Pattern Classifier Sy...Confusion Matrices for Improving Performance of Feature Pattern Classifier Sy...
Confusion Matrices for Improving Performance of Feature Pattern Classifier Sy...Daniele Loiacono
 
Automatically Defined Functions for Learning Classifier Systems
Automatically Defined Functions for Learning Classifier SystemsAutomatically Defined Functions for Learning Classifier Systems
Automatically Defined Functions for Learning Classifier SystemsDaniele Loiacono
 
Voting Based Learning Classifier System for Multi-Label Classification
Voting Based Learning Classifier System for Multi-Label ClassificationVoting Based Learning Classifier System for Multi-Label Classification
Voting Based Learning Classifier System for Multi-Label ClassificationDaniele Loiacono
 
2011 Simulated Car Racing Championship @ GECCO-2011
2011 Simulated Car Racing Championship @ GECCO-20112011 Simulated Car Racing Championship @ GECCO-2011
2011 Simulated Car Racing Championship @ GECCO-2011Daniele Loiacono
 
2010 Simulated Car Racing Championship @ CIG-2010
2010 Simulated Car Racing Championship @ CIG-20102010 Simulated Car Racing Championship @ CIG-2010
2010 Simulated Car Racing Championship @ CIG-2010Daniele Loiacono
 
2010 Simulated Car Racing Championship @ GECCO-2010
2010 Simulated Car Racing Championship @ GECCO-20102010 Simulated Car Racing Championship @ GECCO-2010
2010 Simulated Car Racing Championship @ GECCO-2010Daniele Loiacono
 
2010 Simulated Car Racing Championship @ WCCI-2010
2010 Simulated Car Racing Championship @ WCCI-20102010 Simulated Car Racing Championship @ WCCI-2010
2010 Simulated Car Racing Championship @ WCCI-2010Daniele Loiacono
 
Car Setup Optimization Competition @ EvoStar 2010
Car Setup Optimization Competition @ EvoStar 2010Car Setup Optimization Competition @ EvoStar 2010
Car Setup Optimization Competition @ EvoStar 2010Daniele Loiacono
 
2009 Simulate Car Racing Championship
2009 Simulate Car Racing Championship2009 Simulate Car Racing Championship
2009 Simulate Car Racing ChampionshipDaniele Loiacono
 

More from Daniele Loiacono (20)

2013 Simulated Car Racing @ GECCO-2013
2013 Simulated Car Racing @ GECCO-20132013 Simulated Car Racing @ GECCO-2013
2013 Simulated Car Racing @ GECCO-2013
 
2012 Simulated Car Racing Championship @ CIG-2012
2012 Simulated Car Racing Championship @ CIG-20122012 Simulated Car Racing Championship @ CIG-2012
2012 Simulated Car Racing Championship @ CIG-2012
 
2012 Simulated Car Racing Championship @ GECCO-2012
2012 Simulated Car Racing Championship @ GECCO-20122012 Simulated Car Racing Championship @ GECCO-2012
2012 Simulated Car Racing Championship @ GECCO-2012
 
2012 Simulated Car Racing Championship @ Evo*-2012
2012 Simulated Car Racing Championship @ Evo*-20122012 Simulated Car Racing Championship @ Evo*-2012
2012 Simulated Car Racing Championship @ Evo*-2012
 
Computational Intelligence in Games Tutorial @GECCO2012
Computational Intelligence in Games Tutorial @GECCO2012Computational Intelligence in Games Tutorial @GECCO2012
Computational Intelligence in Games Tutorial @GECCO2012
 
XCSF with Local Deletion: Preventing Detrimental Forgetting
XCSF with Local Deletion: Preventing Detrimental ForgettingXCSF with Local Deletion: Preventing Detrimental Forgetting
XCSF with Local Deletion: Preventing Detrimental Forgetting
 
Testing learning classifier systems
Testing learning classifier systemsTesting learning classifier systems
Testing learning classifier systems
 
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
 
One Step Fits All
One Step Fits AllOne Step Fits All
One Step Fits All
 
Introducing LCS to Digital Design Verification
Introducing LCS to Digital Design VerificationIntroducing LCS to Digital Design Verification
Introducing LCS to Digital Design Verification
 
A temporal classifier system using spiking neural networks
A temporal classifier system using spiking neural networksA temporal classifier system using spiking neural networks
A temporal classifier system using spiking neural networks
 
Confusion Matrices for Improving Performance of Feature Pattern Classifier Sy...
Confusion Matrices for Improving Performance of Feature Pattern Classifier Sy...Confusion Matrices for Improving Performance of Feature Pattern Classifier Sy...
Confusion Matrices for Improving Performance of Feature Pattern Classifier Sy...
 
Automatically Defined Functions for Learning Classifier Systems
Automatically Defined Functions for Learning Classifier SystemsAutomatically Defined Functions for Learning Classifier Systems
Automatically Defined Functions for Learning Classifier Systems
 
Voting Based Learning Classifier System for Multi-Label Classification
Voting Based Learning Classifier System for Multi-Label ClassificationVoting Based Learning Classifier System for Multi-Label Classification
Voting Based Learning Classifier System for Multi-Label Classification
 
2011 Simulated Car Racing Championship @ GECCO-2011
2011 Simulated Car Racing Championship @ GECCO-20112011 Simulated Car Racing Championship @ GECCO-2011
2011 Simulated Car Racing Championship @ GECCO-2011
 
2010 Simulated Car Racing Championship @ CIG-2010
2010 Simulated Car Racing Championship @ CIG-20102010 Simulated Car Racing Championship @ CIG-2010
2010 Simulated Car Racing Championship @ CIG-2010
 
2010 Simulated Car Racing Championship @ GECCO-2010
2010 Simulated Car Racing Championship @ GECCO-20102010 Simulated Car Racing Championship @ GECCO-2010
2010 Simulated Car Racing Championship @ GECCO-2010
 
2010 Simulated Car Racing Championship @ WCCI-2010
2010 Simulated Car Racing Championship @ WCCI-20102010 Simulated Car Racing Championship @ WCCI-2010
2010 Simulated Car Racing Championship @ WCCI-2010
 
Car Setup Optimization Competition @ EvoStar 2010
Car Setup Optimization Competition @ EvoStar 2010Car Setup Optimization Competition @ EvoStar 2010
Car Setup Optimization Competition @ EvoStar 2010
 
2009 Simulate Car Racing Championship
2009 Simulate Car Racing Championship2009 Simulate Car Racing Championship
2009 Simulate Car Racing Championship
 

Recently uploaded

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Recently uploaded (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

GPUs for GEC Competition @ GECCO-2013

  • 1. GECCO 2013 GPUs for GEC GECCO 2013 GPUs for Genetic and Evolutionary Computation Competition Daniele Loiacono and Antonino Tumeo
  • 2. GECCO 2013 GPUs for GEC Why GPUs? !  The GPU has evolved into a very flexible and powerful processor: " It’s programmable using high-level languages " Now supports 32-bit and 64-bit floating point IEEE-754 precision " It offers lots of GFLOPS !  GPU in every PC and workstation
  • 3. GECCO 2013 GPUs for GEC !  Goal " Attract the applications of genetic and evolutionary computation that can maximally exploit the parallelism provided by low-cost consumer graphical cards. !  Evaluation " 50% – Quality and Performance " 30% - Relevance for EC community " 20% – Novelty !  Panel … and myself This competition… Simon Harding El-Ghazali TalbiAntonino TumeoJaume Bacardit
  • 4. GECCO 2013 GPUs for GEC Entries “Fast QAP Solver with ACO and Taboo Search on Multiple GPUs with the Move-Cost Adjusted Thread Assignment”. Shigeyoshi Tsutsui and Noriyuki Fujimoto “GPOCL: A Massively Parallel GP Implementation in OpenCL” Douglas A. Augusto Helio J.C. Barbosa
  • 5. GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary% Fast QAP Solver with ACO and Taboo Search on Multiple GPUs with the Move- Cost Adjusted Thread Assignment
  • 6. GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary% Quadratic Assignment Problem (QAP) •  One of the hardest combinatorial optimization problem –  There are many real-world applications: •  Optimum location allocation of factories in a multinational company •  Optimum section allocation in a big building •  … •  Definition: –  Given n locations and n facilities, the task is to assign the facilities to the locations to minimize the cost •  aij is the distance matrix for each pair of locations i and j •  bij is the flow matrix for each pair of facilities i and j ∑∑= = = n i ji n j ijbaf 1 )()( 1 )( φφφ
  • 7. GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary% Initialize Pheromone density τ" Update pheromone density τ" Construct solutions based on τ" Apply local search (Tabu search) start end ACO+TS on a Single GPU Pheromone Density Matrix τ" Initialize Pheromone density τ" Construct solutions based on τ" Apply local search (Tabu search) Update pheromone density τ" Terminate?Terminate? Instances Construction of solusions TS Updating Trail tai40a 0.007% 99.992% 0.001% tai50a 0.005% 99.994% 0.000% tai60a 0.004% 99.996% 0.000% tai80a 0.002% 99.997% 0.000% tai100a 0.002% 99.998% 0.000% tai50b 0.022% 99.976% 0.002% tai60b 0.017% 99.982% 0.001% tai80b 0.011% 99.988% 0.001% tai100b 0.008% 99.991% 0.000% tai150b 0.005% 99.995% 0.000% Time distribution in sequential run on CPU •  We combined ACO and Taboo Search (TS)
  • 8. GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary% •  A neighbor φ ’ of φ in QAP •  Neighborhood size of N(φ) is Nsize=n(n-1)/2 •  To choose the best φ’, we need to calculate costs for all of Nsize neighbors 2 1 0 3φ Neighborhood in the QAP 0 1 2 3φ’ swap
  • 9. GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary% Computation Cost of a Neighboring Solution •  Fast update [Taillard 04]: –  if we have memory of Δ(φ, r, s) for all pairs r, s, –  and {u, v} ∩ {r, s}= satisfies, Δ(φ’, u, v) can be obtained as: ∑ − ≠= $ $ % & ' ' ( ) −+− +−+− +−+− +−+−=Δ 1 ,,0 )()()()()()()()( )()()()()()()()( )()()()()()()()( )()()()()()()()( )()( )()( )()( )()(),,( n srkk kskrskkrksrk skrkksrkskkr ssrrssrssrsr srrsrsrrssrr bbabba bbabba bbabba bbabbasr φφφφφφφφ φφφφφφφφ φφφφφφφφ φφφφφφφφφ )(nO ))(( ))(( ),,(),,'( )(')(')(')(')(')(')(')(' )(')(')(')(')(')(')(')(' rurvsvsuusvsvrur urvrvsussusvrvru bbbbaaaa bbbbaaaa vuvu φφφφφφφφ φφφφφφφφ φφ −+−−+− +−+−−+− +Δ=Δ )1(O •  Let φ’ be a neighbor of φ obtained by exchanging r-th and s-th elements of φ, then move cost Δ(φ, r, s)=f(φ’) - f(φ) can be obtained as
  • 10. GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary% Parallel computation of move cost -The simplest threads assignment- threadIdx.x=0 threadIdx.x=1 threadIdx.x=2 . . . . . threadIdx.x=Nsize-1 blockIdx.x=0 Assign m agents to blocks Assignmovecalculationstothreads blockIdx.x=1 blockIdx.x= m-1 Nsize=n(n-1)/2 threadIdx.x=0 threadIdx.x=1 threadIdx.x=2 . . . . . threadIdx.x=Nsize-1 threadIdx.x=0 threadIdx.x=1 threadIdx.x=2 . . . . . threadIdx.x=Nsize-1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 0 2 1 2 3 3 4 5 4 6 7 8 9 5 10 11 12 13 14 6 15 16 17 18 19 20 7 21 22 23 24 25 26 27 8 28 29 30 31 32 33 34 35 9 36 37 38 39 40 41 42 43 44 10 45 46 47 48 49 50 51 52 53 54 11 55 56 57 58 59 60 61 62 63 64 65 12 66 67 68 69 70 71 72 73 74 75 76 77 13 78 79 80 81 82 83 84 85 86 87 88 89 90 u v
  • 11. GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary% Move-Cost Adjusted Thread Assignment (MATA) Computational time warp 0 warp 1 0 1 2 3 15 16 30 31 32 33 3232 thread index Computational time No branch divergence in each warp ! 0 1 2 3 4 5 6 28 29 30 31 warp 0 thread index 32 33 34 35 36 37 38 60 61 62 63 warp 1 O(1) O(n) Delay Caused by Branch Divergence
  • 12. GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary% Speedup on a Single GTX480 tai50a tai60a tai80a tai100a tai50b tai60b tai80b tai100b Average 0 5 10 15 20 25 30 35 40 Speedup 3.7 26.1 3.4 27.7 4.3 20.3 3.4 18.3 3.9 24.9 4.6 35.5 4.2 21.4 5.4 29.5 4.1 25.5 CPU: i7 965, 3.2GHz QAP Instances
  • 13. GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary% Implementation on Multiple GPUs CPU ACO0 ACO2 ACO1ACO3 CPU work memory : solutions GPU0 GPU1 GPU2 GPU3 ACO3 ACO0 ACO1
  • 14. GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary% 4 Types of Island Models •  We implemented following 4 types of island models 1.  IM-INDP: Island model with independent runs 2.  IM-ELIT: Island model with elitist 3.  IM-RING: Island model with ring connected 4.  IM-ELMR: Island model with elitist and massive ring connected
  • 16. GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary% IM-ELIT: Island model with elitist worst guy best guy global best guy ACO0 ACO1 ACO3 ACO2
  • 17. GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary% IM-RING: Island model with ring connected worst guy best guy ACO1 ACO2 ACO3 ACO0
  • 18. GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary% IM-ELMR: Island model with elitist and massive ring connected CPU IM-ELIT + ACO1 ACO2 ACO3 ACO0
  • 19. GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary% Results of Island Models with 4 GPUstai50a tai60a tai80a tai100a tai50b tai60b tai80b tai100b Average 0 1 2 3 4 5 6 7 Speedup 2.1 2.6 2.9 3.3 1.9 2.22.3 2.5 2.42.5 2.7 2.9 1.7 2.12.2 2.5 1.5 2.3 2.5 3 1.2 1.4 1.9 2.6 1.5 4.7 4.3 6.5 1.4 2.32.3 3.2 1.7 2.52.6 3.3
  • 20. GECCO Competitions: GPUs%for%Gene+c%and%Evolu+onary%Computa+on,%Evolu+onary% Conclusion •  On a single GPU with “MATA” – 25.5 times speedup to CPU (i7 965, 3.2GHz) •  On 4-GPU (GTX480) – IM_ELMR model has 3.3 times speedup to single GPU •  As a result, 25.5×3.3 = 84.2 times speedup compared with the CPU computation
  • 21. GPOCL: A Massively Parallel GP Implementation in OpenCL Douglas A. Augusto Helio J.C. Barbosa douglas@lncc.br hcbm@lncc.br Laborat´orio Nacional de Computa¸c˜ao Cient´ıfica (LNCC) Rio de Janeiro, Brazil
  • 22. GPOCL’s Features 2 / 12 n Fast and e cient C/C++ implementation based on a compact linear tree representation. n Massively parallel tree interpretation using OpenCL. n It can be executed on virtually any parallel device, comprising dif- ferent architectures and vendors. n It implements three di↵erent parallel strategies (fitness-based, population-based, and a mixture of both). n To improve diversity it can evolve loosely-coupled subpopulations (neighborhoods). n It has a rich set of command-line options, including primitives’ set definition, probabilities of the genetic operators, stopping crite- ria, minimum and maximum tree sizes, and the configuration of neighborhoods. n It is Free Software (http://gpocl.sf.net).
  • 23. Open Computing Language (OpenCL) 3 / 12 n Open Computing Language, or simply OpenCL, is an open- standard programming language for heterogeneous parallel com- puting.1 n It aims at e ciently exploiting the computing power of all process- ing devices, such as traditional processors (CPU) and accelerators (GPU, FPGA, DSP, Intel’s MIC, and so forth). n It provides a uniform programming interface, which saves the pro- grammer from writing di↵erent codes in di↵erent languages when targeting multiple compute architectures, thus providing portabil- ity. n It is very flexible (low-level language). 1 http://www.khronos.org
  • 24. GPOCL 4 / 12 GPOCL implements a GP system using a prefix linear tree represen- tation. Its main routine performs the following high-level procedures: 1. OpenCL initialization: This is the step where the general OpenCL-related tasks are initialized. 2. Calculating n-dimensional ranges: Defines how much paral- lel work there will be and how they are distributed among the compute units. 3. Memory bu↵ers creation: In this phase all global memory re- gions accessed by the OpenCL kernels are allocated on the device and possibly initialized. The fitness cases are transferred and enough space is reserved for the population and error vectors. 4. Kernel building: An OpenCL kernel, relative to a given strategy of parallelization, is compiled just-in-time, targeting the compute device. 5. Evolving: This iterative routine implements the actual genetic programming dynamics.
  • 25. Main Evolutionary Algorithm 5 / 12 Create (randomly) the initial population P; 22 Evaluate(P); for generation 1 to NG do Copy the best (elitism) programs of P to the temporary population Ptmp; while |Ptmp| < |P| do Select and copy from P two fit programs, p1 and p2; if [probabilistically] crossover then Recombine p1 and p2, generating p0 1 and p0 2; p1 p0 1; p2 p0 2; end if [probabilistically] mutation then Apply mutation in p1 and p2, creating p0 1 and p0 2; p1 p0 1; p2 p0 2; end Insert p1 and p2 into Ptmp; end P Ptmp; then reset Ptmp; 1818 Evaluate(P); end return the best program found;
  • 26. Evaluate(P) 6 / 12 The evaluation step itself does not do much—the hard work is done mostly by the OpenCL kernels. Basically, three things happen within Evaluate(P): 1. Population transfer: All programs of P are transferred to the target compute device. 2. Kernel execution: For any non-trivial problem, this is the most demanding phase. Here, the entire recently transferred popula- tion is evaluated—by interpreting each program over each fitness case—on the compute device. Fortunately, this step can be done both in parallel as well accelerated by GPUs. 3. Error retrieval: After being computed and accumulated in the previous step, the population’s prediction errors need to be trans- ferred to the host so that this information is available to the evolutionary process.
  • 27. Overall Best Parallelization Strategy 7 / 12 n The population of programs and fitness cases are parallelized. n A mixture of the fitness- and population-based strategies. n While di↵erent programs are evaluated simultaneously on di↵erent compute units (CU), the processing elements (PE) within each CU take care, in parallel, of the whole training data set. n Since internally to each CU the PEs will be interpreting the same program, the event of instruction divergence is unlikely.
  • 28. Some benchmarks on a NVIDIA GTX-285 GPU An old generation GPU (released in early 2009) 8 / 12
  • 29. Fitness-based Parallelization Strategy 9 / 12 100 1000 5000 10000 25000 50000 1000 5000 10000 25000 50000 1.000 2.000 3.000 4.000 5.000 6.000 7.000 8.000 9.000 10.000 BillionGPop/s Population size Data set size BillionGPop/s 9.540 Billion GPop/s (good performance, but requires a lot of fitness cases)
  • 30. Population-based Parallelization Strategy 10 / 12 100 1000 5000 10000 25000 50000 1000 5000 10000 25000 50000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 BillionGPop/s Population size Data set size BillionGPop/s 0.690 Billion GPop/s (bad performance, causes a lot of instruction divergence)
  • 31. Combined Fitness- and Population-based Parallelization Strategy 11 / 12 100 1000 5000 10000 25000 50000 1000 5000 10000 25000 50000 7.000 8.000 9.000 10.000 11.000 12.000 BillionGPop/s Population size Data set size BillionGPop/s 11.85 Billion GPop/s
  • 33. GECCO 2013 GPUs for GEC Shigeyoshi Tsutsui, Hannan University and Noriyuki Fujimoto, Osaka Prefecture University Fast QAP Solver with ACO and Taboo Search on Multiple GPUs with the Move-Cost Adjusted Thread Assignment And the winner is....