SlideShare una empresa de Scribd logo
1 de 31
Descargar para leer sin conexión
Extreme‐Scale Parallel Symmetric 
Eigensolver for Very Small‐Size 
Matrices Using A Communication‐
Avoiding for Pivot Vectors 
Takahiro Katagiri 
(Information Technology Center, The University of Tokyo)
Jun'ichi Iwata and Kazuyuki Uchida 
(Department of Applied Physics  School of Engineering, 
The University of Tokyo)
Thursday, February 20, Room: Salon A, 10:35‐10:55 
MS34 Auto‐tuning Technologies for Extreme‐Scale Solvers ‐ Part I of III
SIAM PP14, Feb.18‐21, 2014, Marriott Portland Downtown Waterfront, Portland, OR., USA   
Outline
• Target Application: RSDFT
• Parallel Algorithm of Symmetric 
Eigensolver for Small Matrices 
• Performance Evaluation with 76,800 
cores of the Fujitsu FX10
• Conclusion
Outline
• Target Application: RSDFT
• Parallel Algorithm of Symmetric 
Eigensolver for Small Matrices 
• Performance Evaluation with 76,800 
cores of the Fujitsu FX10
• Conclusion
RSDFT (Real Space Density Functional Theory)RSDFT (Real Space Density Functional Theory)
)()(
)(
][)(
2
1 2
rr
rrr
r
r jjj
XC
ion
E
dv 













 
Kohn-Sham equation is solved as a
finite-difference equation
J.-I. Iwata et al., J. Comp. Phys. 229, 2339 (2010).
10648-atom cell of Si crystal and its electron density
Volume of Si crystal
vs. Total Energy
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
18 18.5 19 19.5 20 20.5 21
Energy/atom(eV)
Volume/atom
10648 atoms
21952 atoms
Volume / atom
Energy/atom(eV)
10,648 atoms
21,952 atoms
Structural properties of Si crystal
Requirements of 
Mathematical Software from RSDFT
• An FFT‐free algorithm.
• All eigenvalues and eigenvectors computation for
a dense real symmetric matrix.
– Standard Eigenproblem.
– O(100) times are executed for SCF (Self Consistent Field) process.   
• Re‐orthogonalization for eigenvectors.
• Due to computational complexity, the parts of eigensolver 
and orthogonalization become a bottleneck.
– Since these parts require O(N3) computations, while others require O(N2) 
computations. 
• Matrix and eigenvalues are distributed to obtain 
parallelism for the other parts to eigensolver.
– It is difficult to obtain while data even if it is small.
Requirements of 
Mathematical Software from RSDFT (Cont’d)
• Other parts of the eigensolver in application are also time‐consuming.
Source: Y. Hasegawa et.al.: First‐principles calculations of electron states of 
a silicon nanowire with 100,000 atoms on the K computer, SC11, (2011)
Processes Execution Costs  to whole time [%] Order
SCF 99.6% O(N3)
SD 47.2% O(N3)
Subspace Diag. 44.2% O(N3)
MatE 10.0% O(N3) DGEMM
Eigensolve 19.6% O(N3)
Rot V 14.6% O(N3)
CG (Conjugate Gradient) 26.0% O(N2)
GS (Gramm‐Schmidt Ort.) 25.8% O(N3) DGEMM
Others 0.6% ‐
RSDFT Processes Breakdown
Eigensolve and GS Parts will be 
bottleneck in large‐scale computation, 
but other processes is needed 
to be considered.
• Required memory space is also needed to be considered.
– Due to API of numerical library, such as re‐distribution of data, actual problem 
size is limited as small sizes with respect to remainder memory space.
Our Assumption
• Target : The eigensolver part in RSDFT
• Exa‐scale computing: Total number of nodes is 
on the order of 1,000,000 (a million).
• Since the matrix is two‐dimensional (2D), 
the size of the matrix required in exa‐scale computers 
reaches the order of:
10,000 * sqrt (1,000,000) = 10,000,000 (ten millions), 
if each node has matrix of N=10,000 .
• Since most dense solvers require O(N3) for 
computational complexity, the execution time 
with a matrix of 
N=10,000,000 (ten millions) is unrealistic 
in actual applications (in production‐run phase). 
Our Assumption (Cont’d)
• We presume that N=1,000 per node is the 
maximum size. The size in exa‐scale is on the 
order of N=1,000,000 (a million).
• The used memory size of a matrix per node is 
only on the order of 8 MB. 
– ! This is eigensolver part only.
• This is just the cache size for current CPUs.
– Next generation CPUs may be having order of 
100MB cache!
• Such as the IBM Power8 with e‐DRAM (3D Stacked Memory) 
for L4 cache. 
Originalities of Our Eigensolver
1. Non‐blocking Computation Algorithm
 Since data in cache in our assumption in exa‐scale 
computing.  
2. Communication reducing and 
communication avoiding algorithm
 Tridiagonalization and Householder inverse 
transformation of symmetric eigensolvers.
 By duplicating Householder vectors. 
3. Hybrid MPI‐OpenMP execution 
 With a full system of a peta‐scale supercomputer 
(The Fujitsu FX10) consisting of 4800 nodes 
(76,800 cores). 
Outline
• Target Application: RSDFT
• Parallel Algorithm of Symmetric 
Eigensolver for Small Matrices 
• Performance Evaluation with 76,800 
cores of the Fujitsu FX10
• Conclusion
A Classical Householder Algorithm
(Standard Eigenproblem )xAx 
Symmetric Dense Matrix
A
1. Householder Transformation
QAQ=T
Tri-diagonalization
16
)( 3
nO
T
Tridiagonal
matrix
4. Householder Inverse
Transformation
A: Dense matrix
All eigenvectors: X = QY
)( 3
nO
Q=H1 H2 … Hn-2
2. Bisection
T: Tridiagonal matrix
All eigenvalues :Λ
3. Inverse Iteration
T : Tridiagonal matrix
All eigenvectors: Y
)(~)( 32
nOnO
)( 2
nOMRRR:
Whole Parallel Processes on the Eigensolver
A
Tridiagonalization
T
Gather
All Elements T T
T T
Upper
Lower
Compute Upper and Lower limits
For eigenvalues
1,2,3,4… (Rising Order)
Λ
1,2,3,4… (Corresponding to
Rising Order for the eigenvalues
Compute Eigenvectors
Householder Inverse Transformation
YGather
All Eigenvalues
Λ 17
2D
Cyclic‐Cyclic Distribution
Data Duplication in Tridiagonalization
19
Matrix A
:Vectors
uk  , xk
uk
uk
Duplication of 
p Processes
q Processes
uk
: Householder 
Vector
:Vectors
yk, 
yk
ykDuplication of 
Transposed yk in Tridiagonalization (The case of p < q)
20
yk
Multi‐casting 
MPI_ALLREDUCE
p Processes
q Processes
p=2
q=4
:Root
Processes
: With Rectangle Processor Grid  [Katagiri and Itoh, 2010]
ykDuplication of
Communication
Avoiding
By Using 
the Duplications
<1> do k=n-2, 1, -1
<2>   Gather the vector      and  scalar    
by using multiple MPI_BCASTs.
<3>  do i=nstart, nend
<4>      
<5>    
<6>   enddo
<7> enddo      
Parallel Householder Inverse Transformation
ku
ikiink
k
ink
k
uAA  ,:
)(
,:
)(
k
21
ink
kT
kki Au ,:
)(
  
①Multi‐casting 
MPI_BCAST
Gathering vector uk for Inverse Transformation
:Non-packing messages for gathering uk
22
uk
ukDuplication of 
p Processes
q Processes
p = 2
q = 4
②Multi‐casting 
MPI_BCAST
Communication
Avoiding
by using 
the duplications
Gathering vector uk for Inverse Transformation
:Packing messages for gathering uk
23
uk
ukDuplication of 
p Processes
q Processes
p = 2
q = 4
①Multi‐casting 
MPI_BCAST
②Multi‐casting 
MPI_BCAST
Communication
Avoiding &
Reducing 
by using packing
of messages uk : Send the two vectors
by one communication
→Communication Blocking 
Communication 
Blocking Length = 2
uk+1
Outline
• Target Application: RSDFT
• Parallel Algorithm of Symmetric 
Eigensolver for Small Matrices 
• Performance Evaluation with 76,800 
cores of the Fujitsu FX10
• Conclusion
Oakleaf‐FX (ITC, U.Tokyo), The Fujitsu PRIMEHPC FX10
Contents Specifications
Whole
System
Total Performance 1.135 PFLOPS
Total Memory Amounts 150 TB
Total #nodes 4,800
Inter Connection
The TOFU
(6 Dimension 
Mesh / Torus)
Local File System Amounts 1.1 PB
Shared File System Amounts 2.1 PB
Contents Specifications
Node
Theoretical Peak Performance 236.5 GFlops
#Processors (#Cores) 16
Main Memory Amounts 32 GB
Processor
Processor Name SPARC64 IX‐fx
Frequency 1.848 GHz
Theoretical Peak Performance (Core) 14.78 GFLOPS
4800 Nodes (76,800 Cores)
COMMUNICATION AVOIDING 
EFFECT
Householder Inverse Transformation
(4096 Nodes (65,536 Cores), 64x64), N=38,400, Hybrid
0
10
20
30
40
50
60
70
80
90
MPI_BCAST Binary Tree MPI_Isend Block MPI_BCAST
Time in Second
Communication Implementations
Other HIT Ker Send Piv
The Best
Parameter
#Processes =4096
#Threads=16/node
Comm. Block =12
Non‐packing Sending Packing Sending 
1.57x
Non‐blocking MPI
HYBRID MPI‐OPENMP
EFFECT
Pure MPI vs. Hybrid MPI‐OpenMPI
(64 Nodes (1024 Cores)), N=4800, Total Time
0
0.5
1
1.5
2
2.5
3
3.5
16x64 (Pure MPI) 8x8 (Hybrid MPI)
Time in Second
Process Organization
Householder Inv
Calculating Eigenvectors
Re‐distribution
Tridiagonalization
1.61x
64 MPI Processes,
16 OMP Threads/MPI Process
Pure MPI vs. Hybrid MPI‐OpenMPI
(64 Nodes (1024 Cores)), N=4800, Tridiagonalization
0
0.5
1
1.5
2
2.5
16x64 (Pure MPI) 8x8 (Hybrid MPI)
Time in Second
Process Organization
Other Update
MatVec MatVec Reduce
Send xt Send yt
Send Piv
Communication
Computation
27.9%
46.1%72.1%
53.9%18.2 Points 
Reduction
Pure MPI vs. Hybrid MPI‐OpenMPI
(64 Nodes (1024 Cores)), N=4800, 
Householder Inverse Transformation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
16x64 (Pure MPI) 8x8 (Hybrid MPI)
Time in Second
Process Organization
Other
HIT Ker
Send Piv
Communication
Computation
15.6%
44.6%
84.4%
55.4%29 Points 
Reduction
FX10 76800 CORES (4800 NODES)
RESULTS 
Hybrid MPI‐OpenMP Execution
in 4800 nodes (76,800 Cores) (40x120)
31.8  83.3 
429.9 
34.3 
180.1 
904.0 
0
200
400
600
800
1000
1200
1400
1600
N=41568 N=83138 N=166276
Time in Second
Process Organization
Householder Inv
Calculating Eigenvec
Re‐dist
Tridiag
HIT comm. block=6
HIT comm. block=4
HIT comm. block=2
2.61x
5.24x5.16x
5.01x
3.97x
5.05x
Inner L1 
Cache Size
Only 4x increase 
with 2x problem size
in O(N3) algorithm
Execution Time in Pure MPI
between ScaLAPACK PDSYEVD and Ours
ScaLAPACK (version 1.8) on the Fujitsu FX10. Fujitsu Optimized BLAS is used.
The best block size is specified for each ScaLAPACK execution in range between 
1,  8, 16, 32, 64, 128, and 256.
4.26
10.96
25.76
1.79
4.61
15.52
0
5
10
15
20
25
30
N=4800 (8x8) 64
cores
N=9600 (16x16) 256
cores
N=19200 (32x32)
1024 cores
ScaLAPACK
Ours
[Time in Seconds]
Better
Conclusion
• Our eigensolver is effective for very small matrices to 
utilize communication reducing and avoiding 
techniques.
– By halving duplicate Householder vectors in 
Tridiagonalization and Householder Inverse 
Transformation phases.
– By using reduced communications for multiple sending 
with 2D splitting for process grid.
– By using packing messages for Householder Inverse 
Transformation part.
• Selection of implementations in communication 
processes is the target of AT.
– The best implementation depends on process grids, the 
number of processors, and block size for data packing.    
Conclusion (Cont’d)
• One of drawbacks is increase of memory space.
– , where process grid is p * q.
– Since memory space for matrix is in cache size, the 
increase of memory space can be ignored.  
• Comparison with new blocking algorithms is 
future work.
– 2‐step method with block Householder 
tridiagonalization.
• Eigen‐K (Riken)
• ELPA (Technische Universität München)
• A new implementation of PLASMA and MAGMA 
)/( 2
pNO
Acknowledgements 
• Computational resource of Fujitsu FX10 
was awarded by 
“Large‐scale HPC Challenge” Project, 
Information Technology Center, 
The University of Tokyo.
This topic was submitted to Parallel Computing.
(As of December 2013.)

Más contenido relacionado

La actualidad más candente

Shor’s algorithm the ppt
Shor’s algorithm the pptShor’s algorithm the ppt
Shor’s algorithm the pptMrinal Mondal
 
Data Structures and Algorithm Analysis
Data Structures  and  Algorithm AnalysisData Structures  and  Algorithm Analysis
Data Structures and Algorithm AnalysisMary Margarat
 
Lecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithmsLecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithmsVivek Bhargav
 
Hidden markovmodel
Hidden markovmodelHidden markovmodel
Hidden markovmodelPiyorot
 
Implemenation of Vedic Multiplier Using Reversible Gates
Implemenation of Vedic Multiplier Using Reversible Gates Implemenation of Vedic Multiplier Using Reversible Gates
Implemenation of Vedic Multiplier Using Reversible Gates csandit
 
OPTIMIZED REVERSIBLE VEDIC MULTIPLIERS
OPTIMIZED REVERSIBLE VEDIC MULTIPLIERSOPTIMIZED REVERSIBLE VEDIC MULTIPLIERS
OPTIMIZED REVERSIBLE VEDIC MULTIPLIERSUday Prakash
 
Algorithm Analyzing
Algorithm AnalyzingAlgorithm Analyzing
Algorithm AnalyzingHaluan Irsad
 
CS8461 - Design and Analysis of Algorithms
CS8461 - Design and Analysis of AlgorithmsCS8461 - Design and Analysis of Algorithms
CS8461 - Design and Analysis of AlgorithmsKrishnan MuthuManickam
 
CS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of AlgorithmsCS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of AlgorithmsKrishnan MuthuManickam
 
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!idescitation
 
Iaetsd low power high speed vedic multiplier using reversible
Iaetsd low power high speed vedic multiplier using reversibleIaetsd low power high speed vedic multiplier using reversible
Iaetsd low power high speed vedic multiplier using reversibleIaetsd Iaetsd
 
MMath Paper, Canlin Zhang
MMath Paper, Canlin ZhangMMath Paper, Canlin Zhang
MMath Paper, Canlin Zhangcanlin zhang
 
Algorithm analysis (All in one)
Algorithm analysis (All in one)Algorithm analysis (All in one)
Algorithm analysis (All in one)jehan1987
 
02 order of growth
02 order of growth02 order of growth
02 order of growthHira Gul
 
Design and analysis of Algorithm By Dr. B. J. Mohite
Design and analysis of Algorithm By Dr. B. J. MohiteDesign and analysis of Algorithm By Dr. B. J. Mohite
Design and analysis of Algorithm By Dr. B. J. MohiteZeal Education Society, Pune
 
Fundamentals of the Analysis of Algorithm Efficiency
Fundamentals of the Analysis of Algorithm EfficiencyFundamentals of the Analysis of Algorithm Efficiency
Fundamentals of the Analysis of Algorithm EfficiencySaranya Natarajan
 
TIM: Large-scale Energy Forecasting in Julia
TIM: Large-scale Energy Forecasting in JuliaTIM: Large-scale Energy Forecasting in Julia
TIM: Large-scale Energy Forecasting in JuliaGapData Institute
 

La actualidad más candente (20)

Shor’s algorithm the ppt
Shor’s algorithm the pptShor’s algorithm the ppt
Shor’s algorithm the ppt
 
Data Structures and Algorithm Analysis
Data Structures  and  Algorithm AnalysisData Structures  and  Algorithm Analysis
Data Structures and Algorithm Analysis
 
Lecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithmsLecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithms
 
Hidden markovmodel
Hidden markovmodelHidden markovmodel
Hidden markovmodel
 
Implemenation of Vedic Multiplier Using Reversible Gates
Implemenation of Vedic Multiplier Using Reversible Gates Implemenation of Vedic Multiplier Using Reversible Gates
Implemenation of Vedic Multiplier Using Reversible Gates
 
Pclsp ntnu
Pclsp ntnuPclsp ntnu
Pclsp ntnu
 
OPTIMIZED REVERSIBLE VEDIC MULTIPLIERS
OPTIMIZED REVERSIBLE VEDIC MULTIPLIERSOPTIMIZED REVERSIBLE VEDIC MULTIPLIERS
OPTIMIZED REVERSIBLE VEDIC MULTIPLIERS
 
Algorithm Analyzing
Algorithm AnalyzingAlgorithm Analyzing
Algorithm Analyzing
 
CS8461 - Design and Analysis of Algorithms
CS8461 - Design and Analysis of AlgorithmsCS8461 - Design and Analysis of Algorithms
CS8461 - Design and Analysis of Algorithms
 
CS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of AlgorithmsCS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of Algorithms
 
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n 70 Lakhs!
 
Iaetsd low power high speed vedic multiplier using reversible
Iaetsd low power high speed vedic multiplier using reversibleIaetsd low power high speed vedic multiplier using reversible
Iaetsd low power high speed vedic multiplier using reversible
 
Aa sort-v4
Aa sort-v4Aa sort-v4
Aa sort-v4
 
MMath Paper, Canlin Zhang
MMath Paper, Canlin ZhangMMath Paper, Canlin Zhang
MMath Paper, Canlin Zhang
 
Algorithm analysis (All in one)
Algorithm analysis (All in one)Algorithm analysis (All in one)
Algorithm analysis (All in one)
 
02 order of growth
02 order of growth02 order of growth
02 order of growth
 
Design and analysis of Algorithm By Dr. B. J. Mohite
Design and analysis of Algorithm By Dr. B. J. MohiteDesign and analysis of Algorithm By Dr. B. J. Mohite
Design and analysis of Algorithm By Dr. B. J. Mohite
 
Fundamentals of the Analysis of Algorithm Efficiency
Fundamentals of the Analysis of Algorithm EfficiencyFundamentals of the Analysis of Algorithm Efficiency
Fundamentals of the Analysis of Algorithm Efficiency
 
TIM: Large-scale Energy Forecasting in Julia
TIM: Large-scale Energy Forecasting in JuliaTIM: Large-scale Energy Forecasting in Julia
TIM: Large-scale Energy Forecasting in Julia
 
A Polynomial-Space Exact Algorithm for TSP in Degree-5 Graphs
A Polynomial-Space Exact Algorithm for TSP in Degree-5 GraphsA Polynomial-Space Exact Algorithm for TSP in Degree-5 Graphs
A Polynomial-Space Exact Algorithm for TSP in Degree-5 Graphs
 

Destacado

ppOpen-ATによる静的コード生成で実現する 自動チューニング方式の評価
ppOpen-ATによる静的コード生成で実現する自動チューニング方式の評価ppOpen-ATによる静的コード生成で実現する自動チューニング方式の評価
ppOpen-ATによる静的コード生成で実現する 自動チューニング方式の評価Takahiro Katagiri
 
Impact of Auto-tuning of Kernel Loop Transformation by using ppOpen-AT
Impact of Auto-tuning of Kernel Loop Transformation by using ppOpen-ATImpact of Auto-tuning of Kernel Loop Transformation by using ppOpen-AT
Impact of Auto-tuning of Kernel Loop Transformation by using ppOpen-ATTakahiro Katagiri
 
ppOpen-AT : Yet Another Directive-base AT Language
ppOpen-AT : Yet Another Directive-base AT LanguageppOpen-AT : Yet Another Directive-base AT Language
ppOpen-AT : Yet Another Directive-base AT LanguageTakahiro Katagiri
 
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...Takahiro Katagiri
 
ppOpen-HPCコードを自動チューニングする言語ppOpen-ATの現状と新展開
ppOpen-HPCコードを自動チューニングする言語ppOpen-ATの現状と新展開ppOpen-HPCコードを自動チューニングする言語ppOpen-ATの現状と新展開
ppOpen-HPCコードを自動チューニングする言語ppOpen-ATの現状と新展開Takahiro Katagiri
 
Towards Auto‐tuning for the Finite Difference Method in Era of 200+ Thread Pa...
Towards Auto‐tuning for the Finite Difference Method in Era of 200+ Thread Pa...Towards Auto‐tuning for the Finite Difference Method in Era of 200+ Thread Pa...
Towards Auto‐tuning for the Finite Difference Method in Era of 200+ Thread Pa...Takahiro Katagiri
 
自動チューニングとビックデータ:機械学習の適用の可能性
自動チューニングとビックデータ:機械学習の適用の可能性自動チューニングとビックデータ:機械学習の適用の可能性
自動チューニングとビックデータ:機械学習の適用の可能性Takahiro Katagiri
 
ATTA2014基盤B導入(片桐)
ATTA2014基盤B導入(片桐)ATTA2014基盤B導入(片桐)
ATTA2014基盤B導入(片桐)Takahiro Katagiri
 
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...Takahiro Katagiri
 
Auto‐Tuning of Hierarchical Computations with ppOpen‐AT
Auto‐Tuning of Hierarchical Computations with ppOpen‐ATAuto‐Tuning of Hierarchical Computations with ppOpen‐AT
Auto‐Tuning of Hierarchical Computations with ppOpen‐ATTakahiro Katagiri
 
SCG-AT:静的コード生成のみによる自動チューニング実現方式
SCG-AT:静的コード生成のみによる自動チューニング実現方式SCG-AT:静的コード生成のみによる自動チューニング実現方式
SCG-AT:静的コード生成のみによる自動チューニング実現方式Takahiro Katagiri
 
ソフトウェア自動チューニング研究紹介
ソフトウェア自動チューニング研究紹介ソフトウェア自動チューニング研究紹介
ソフトウェア自動チューニング研究紹介Takahiro Katagiri
 

Destacado (14)

ppOpen-ATによる静的コード生成で実現する 自動チューニング方式の評価
ppOpen-ATによる静的コード生成で実現する自動チューニング方式の評価ppOpen-ATによる静的コード生成で実現する自動チューニング方式の評価
ppOpen-ATによる静的コード生成で実現する 自動チューニング方式の評価
 
Ase20 20151016 hp
Ase20 20151016 hpAse20 20151016 hp
Ase20 20151016 hp
 
Impact of Auto-tuning of Kernel Loop Transformation by using ppOpen-AT
Impact of Auto-tuning of Kernel Loop Transformation by using ppOpen-ATImpact of Auto-tuning of Kernel Loop Transformation by using ppOpen-AT
Impact of Auto-tuning of Kernel Loop Transformation by using ppOpen-AT
 
ppOpen-AT : Yet Another Directive-base AT Language
ppOpen-AT : Yet Another Directive-base AT LanguageppOpen-AT : Yet Another Directive-base AT Language
ppOpen-AT : Yet Another Directive-base AT Language
 
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER a...
 
ppOpen-HPCコードを自動チューニングする言語ppOpen-ATの現状と新展開
ppOpen-HPCコードを自動チューニングする言語ppOpen-ATの現状と新展開ppOpen-HPCコードを自動チューニングする言語ppOpen-ATの現状と新展開
ppOpen-HPCコードを自動チューニングする言語ppOpen-ATの現状と新展開
 
Towards Auto‐tuning for the Finite Difference Method in Era of 200+ Thread Pa...
Towards Auto‐tuning for the Finite Difference Method in Era of 200+ Thread Pa...Towards Auto‐tuning for the Finite Difference Method in Era of 200+ Thread Pa...
Towards Auto‐tuning for the Finite Difference Method in Era of 200+ Thread Pa...
 
自動チューニングとビックデータ:機械学習の適用の可能性
自動チューニングとビックデータ:機械学習の適用の可能性自動チューニングとビックデータ:機械学習の適用の可能性
自動チューニングとビックデータ:機械学習の適用の可能性
 
iWAPT2015_katagiri
iWAPT2015_katagiriiWAPT2015_katagiri
iWAPT2015_katagiri
 
ATTA2014基盤B導入(片桐)
ATTA2014基盤B導入(片桐)ATTA2014基盤B導入(片桐)
ATTA2014基盤B導入(片桐)
 
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...
 
Auto‐Tuning of Hierarchical Computations with ppOpen‐AT
Auto‐Tuning of Hierarchical Computations with ppOpen‐ATAuto‐Tuning of Hierarchical Computations with ppOpen‐AT
Auto‐Tuning of Hierarchical Computations with ppOpen‐AT
 
SCG-AT:静的コード生成のみによる自動チューニング実現方式
SCG-AT:静的コード生成のみによる自動チューニング実現方式SCG-AT:静的コード生成のみによる自動チューニング実現方式
SCG-AT:静的コード生成のみによる自動チューニング実現方式
 
ソフトウェア自動チューニング研究紹介
ソフトウェア自動チューニング研究紹介ソフトウェア自動チューニング研究紹介
ソフトウェア自動チューニング研究紹介
 

Similar a Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Using A Communication-Avoiding for Pivot Vectors

Schrodinger equation in QM Reminders.ppt
Schrodinger equation in QM Reminders.pptSchrodinger equation in QM Reminders.ppt
Schrodinger equation in QM Reminders.pptRakeshPatil2528
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloJeremyHeng10
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloJeremyHeng10
 
Scalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatilityScalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatilitySYRTO Project
 
Existence and Uniqueness Result for a Class of Impulsive Delay Differential E...
Existence and Uniqueness Result for a Class of Impulsive Delay Differential E...Existence and Uniqueness Result for a Class of Impulsive Delay Differential E...
Existence and Uniqueness Result for a Class of Impulsive Delay Differential E...AI Publications
 
Chp%3 a10.1007%2f978 3-642-55753-8-3
Chp%3 a10.1007%2f978 3-642-55753-8-3Chp%3 a10.1007%2f978 3-642-55753-8-3
Chp%3 a10.1007%2f978 3-642-55753-8-3Sabina Czyż
 
Dumitru Vulcanov - Numerical simulations with Ricci flow, an overview and cos...
Dumitru Vulcanov - Numerical simulations with Ricci flow, an overview and cos...Dumitru Vulcanov - Numerical simulations with Ricci flow, an overview and cos...
Dumitru Vulcanov - Numerical simulations with Ricci flow, an overview and cos...SEENET-MTP
 
Module v sp
Module v spModule v sp
Module v spVijaya79
 
Solvability of Fractionl q -Difference Equations of Order 2   3 Involving ...
Solvability of Fractionl q -Difference Equations of Order 2   3 Involving ...Solvability of Fractionl q -Difference Equations of Order 2   3 Involving ...
Solvability of Fractionl q -Difference Equations of Order 2   3 Involving ...journal ijrtem
 
Solvability of Fractionl q -Difference Equations of Order 2   3 Involving ...
Solvability of Fractionl q -Difference Equations of Order 2   3 Involving ...Solvability of Fractionl q -Difference Equations of Order 2   3 Involving ...
Solvability of Fractionl q -Difference Equations of Order 2   3 Involving ...IJRTEMJOURNAL
 
Applications Of One Type Of Euler-Lagrange Fractional Differential Equation
Applications Of One Type Of Euler-Lagrange Fractional Differential EquationApplications Of One Type Of Euler-Lagrange Fractional Differential Equation
Applications Of One Type Of Euler-Lagrange Fractional Differential EquationIRJET Journal
 
Large-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressivenessLarge-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressivenessSangjin Han
 
Graph theoretic approach to solve measurement placement problem for power system
Graph theoretic approach to solve measurement placement problem for power systemGraph theoretic approach to solve measurement placement problem for power system
Graph theoretic approach to solve measurement placement problem for power systemIAEME Publication
 
Maapr3
Maapr3Maapr3
Maapr3FNian
 

Similar a Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Using A Communication-Avoiding for Pivot Vectors (20)

Schrodinger equation in QM Reminders.ppt
Schrodinger equation in QM Reminders.pptSchrodinger equation in QM Reminders.ppt
Schrodinger equation in QM Reminders.ppt
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte Carlo
 
Irjet v2i170
Irjet v2i170Irjet v2i170
Irjet v2i170
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte Carlo
 
Scalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatilityScalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatility
 
Existence and Uniqueness Result for a Class of Impulsive Delay Differential E...
Existence and Uniqueness Result for a Class of Impulsive Delay Differential E...Existence and Uniqueness Result for a Class of Impulsive Delay Differential E...
Existence and Uniqueness Result for a Class of Impulsive Delay Differential E...
 
Chp%3 a10.1007%2f978 3-642-55753-8-3
Chp%3 a10.1007%2f978 3-642-55753-8-3Chp%3 a10.1007%2f978 3-642-55753-8-3
Chp%3 a10.1007%2f978 3-642-55753-8-3
 
Dumitru Vulcanov - Numerical simulations with Ricci flow, an overview and cos...
Dumitru Vulcanov - Numerical simulations with Ricci flow, an overview and cos...Dumitru Vulcanov - Numerical simulations with Ricci flow, an overview and cos...
Dumitru Vulcanov - Numerical simulations with Ricci flow, an overview and cos...
 
Module v sp
Module v spModule v sp
Module v sp
 
Cdc18 dg lee
Cdc18 dg leeCdc18 dg lee
Cdc18 dg lee
 
Solvability of Fractionl q -Difference Equations of Order 2   3 Involving ...
Solvability of Fractionl q -Difference Equations of Order 2   3 Involving ...Solvability of Fractionl q -Difference Equations of Order 2   3 Involving ...
Solvability of Fractionl q -Difference Equations of Order 2   3 Involving ...
 
Solvability of Fractionl q -Difference Equations of Order 2   3 Involving ...
Solvability of Fractionl q -Difference Equations of Order 2   3 Involving ...Solvability of Fractionl q -Difference Equations of Order 2   3 Involving ...
Solvability of Fractionl q -Difference Equations of Order 2   3 Involving ...
 
PhD defense talk slides
PhD  defense talk slidesPhD  defense talk slides
PhD defense talk slides
 
KAUST_talk_short.pdf
KAUST_talk_short.pdfKAUST_talk_short.pdf
KAUST_talk_short.pdf
 
Applications Of One Type Of Euler-Lagrange Fractional Differential Equation
Applications Of One Type Of Euler-Lagrange Fractional Differential EquationApplications Of One Type Of Euler-Lagrange Fractional Differential Equation
Applications Of One Type Of Euler-Lagrange Fractional Differential Equation
 
Large-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressivenessLarge-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressiveness
 
Nonnegative Matrix Factorization with Side Information for Time Series Recove...
Nonnegative Matrix Factorization with Side Information for Time Series Recove...Nonnegative Matrix Factorization with Side Information for Time Series Recove...
Nonnegative Matrix Factorization with Side Information for Time Series Recove...
 
02_AJMS_297_21.pdf
02_AJMS_297_21.pdf02_AJMS_297_21.pdf
02_AJMS_297_21.pdf
 
Graph theoretic approach to solve measurement placement problem for power system
Graph theoretic approach to solve measurement placement problem for power systemGraph theoretic approach to solve measurement placement problem for power system
Graph theoretic approach to solve measurement placement problem for power system
 
Maapr3
Maapr3Maapr3
Maapr3
 

Último

Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubssamaasim06
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxNikitaBankoti2
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMoumonDas2
 

Último (20)

Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptx
 

Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Using A Communication-Avoiding for Pivot Vectors