SlideShare una empresa de Scribd logo
1 de 101
Descargar para leer sin conexión
2018. 8. 7
mkatouda@aoni.waseda.jp
• 8/7 :
– 1 :
•
•
•
– 2 :
•
–
–
•
– Hartree-Fock
– Post Hartree-Fock : MP2
– : (TD-DFT)
– 3 :
•
•
2018.8.7
• 8/8 :
– 4 :
•
•
• PS2
•
– 5 :
•
• ( )
•
2018.8.7
1
2018.8.7
• =
–
( 1000 )
• (*)
– 50 TFLOPs
– Tera FLoating Operation Per second: 1 1
– 1 50
(*) 2018 8
2018.8.7
2018.8.7
1923
1946 ENIAC
1976 CRAY-1
2002
2011
11 PFLOPs
160 MFLOPs
40 TFLOPs
6250
25
2018.8.7'8 6
1923
1946 ENIAC
1976 CRAY/1
2002
2011
11)PFLOPs
160)MFLOPs
40)TFLOPs
6250
25
2018.8.7
80000
100
)
Top 500 List
•
– https://www.top500.org/
• LINPACK
–
•
2018.8.7
Top 500 List
2018.8.7
Rmax
(PFLOPs)
Rpeak
(PFLOPs)
1 Summit USA 2018 122.300 187.659
2 Sunway TaihuLight 2016 93.015 125.436
3 Sierra USA 2018 71.610 119.194
4 Tianhe-2A 2013 61.445 100.679
5 AI Bridging Cloud Infrastructure 2018 19.880 32.577
6 Piz Daint 2016 19.590 25.326
7 Titan USA 2012 17.590 27.112
8 Sequoia USA 2013 17.173 20.133
9 Trinity USA 2015 14.137 43.903
10 Cori NERSC USA 2016 14.014 27.881
11 Nurion
Korea Institute of Science
and Technology Information 2018 13.929 25.706
12 Oakforest-PACS
HPC
2016 13.555 24.914
16 2011 10.510 11.280
100PFLOPs
CPU GPU
GPU
Top500 2018 6 1
HPCG 2018 6 1
CPU IBM POWER 9 2
CPU 1.08 TFLOPs
(24.56 GF x 22 x 2)
CPU 256 GB
GPU NVIDIA Tesla V100 6
GPU 46.8 TFLOPs
(7.8 TF x 6 )
GPU 96 GB
256
4608
Infiniband EDR
122.3 PFLOPs
10 PB
250 PB
Summit
: TOP500 Web
2018.8.7
Top500 2011 6 ,10 2 1
Graph500
2014 6 2015 7 6
1
HPCG
2014 11 4 2
2016 11 3 1
CPU SPARC64™ VIIIfx 2GHz
CPU 128 GFLOPs
(16 GF x 8 cores)
16 GB
864
82,944
Tofu
(6D Mesh/Torus)
10.62 PFLOPs
1.26 PB
Fujitsu Exabyte File
System (FEFS)
30 PB
2018.8.7
• R-CCS
– http://www.r-
ccs.riken.jp/jp/outreach/videogallery.html
•
– https://www.youtube.com/watch?v=_ze51XkKd_I
2018.8.7
2018.8.7
Mn4CaO5
OEC
Mn1
Mn2
Mn3
Mn4
Ca O1
O2
O3
O4
O5
PSII Chloroplast
stroma
Thylakoid
lumen
Thylakoid
membrane
ˆHΨ = EΨ E = mc2
ma = F ∇×E = −
∂B
∂t
2018.8.7
10-10m
10-8m
100m
102m
107m
1021m
• R-CCS
– http://www.r-ccs.riken.jp/jp/outreach/videogallery.html
•
– https://www.youtube.com/watch?v=JqNj3YOfyMY&feature=youtu.be
•
– https://www.youtube.com/watch?v=2JU2LjPDrQY
•
– https://www.youtube.com/watch?v=Tx1RHU7Zw2c
• UT-Heart
– https://www.youtube.com/watch?v=tBdFv28EEq0
•
– https://www.youtube.com/watch?v=w6G5TMTE-z4
•
– https://www.youtube.com/watch?v=DqeEgG52AZY
•
– https://www.youtube.com/watch?v=6nr13DMgU3A
• -
– https://www.youtube.com/watch?v=1F6L8rgJT8w
2018.8.7
2018.8.7
core core
core core
CPU
core core
core core
CPU
core core
core core
CPU
core core
core core
CPU
• CPU
• -
:
: 8
•
•
–
–
2018.8.7
2018.8.7
1 1 1 1
1
2
3
4
: 1/4
•
( 1/4 )
•
=
2018.8.7
(1 ) (99 )
100
1+0.99=1.99
50
1000
1+0.099=1.099
91
2018.8.7
(0.1 ) (99.9 )
100
0.1+0.999=1.099
90
1000
0.1+0.0999=0.1999
500
•
•
• OpenMP (CPU ), OpenACC (GPU)
CPU
• MPI( ), CUDA (GPU)
OpenMP, OpenACC
• : BLAS, LAPACK, FFTW (
, GPU)
x
• ( )
2018.8.7
MPI/OpenMP
2018.8.7
MPI OpenMP
MPI
core core
core core
CPU
core core
core core
CPU
core core
core core
CPU
core core
core core
CPU
core core
core core
CPU
core core
core core
CPU
MPI :
MPI/OpenMP :
MPI/OpenMP
•
–
• OpenMP
• MPI
– : OpenMP
MPI
•
–
2018.8.7
2
2018.8.7
“ : quantum chemistry
”
Wikipedia
2018.8.7
•
• Schrödinger Dirac
•
Schrödinger Dirac
2018.8.7
"The fundamental laws necessary for the mathematical treatment of large parts of physics
and the whole chemistry are thus fully known, and the difficulty lies only in the fact that
application of these laws leads to equations that are too complex to be solved.”
- Paul Dirac, Proc. Roy. Soc. London , A123, 714 1929
HΨ = EΨ
H = −
∇A
2
2MA
N
∑ −
∇i
2
2A
n
∑ +
ZAZB
RABA>B
∑ −
ZA
rAiA,i
∑ +
1
riji> j
∑
Ψ ≡ Ψ r1,r2,,rn;R1,R2,,RN( ) Erwin SchrödingerEH
BA AB
BA
ji ijiA Ai
A
n
i
i
N
A
A
R
ZZ
rr
Z
M
H
,,
22
1
22
),,,;,,,( 2121 Nn RRRrrr
e2/rij
ZAe2/riA
i
j
A B
ZBe2/rjB
ZA ZB e2/RAB
N n
?!
?!
• Schrödinger
Dirac
– (Born-Oppenheimer )
–
– 1 Slater
– 1
– ( )
–
2018.8.7
• Hartree-Fock HF
–
–
–
• (Density functional theory: DFT)
–
– 1
– HF
• (Post-HF )
– HF
–
• Møller-Plesset (MP)
• (Coupled-Cluster(CC) )
• (CI)
2018.8.7
2018.8.7
•
•
• N 3 : O(N3)
• …
Hartree-
Fock(HF) (DFT)
MP2
( )
CCSD
(
)
CCSD(T)
(
)
O(N3) O(N3) O(N5) O(N6) O(N7)
•
• Gaussian, Q-Chem, Turbomole, MOLPRO, Molcas, ADF
•
–
• ORCA, Firefly, NTChem
– ( )
• GAMESS, DIRAC, ABINIT-MP
– (GPL, Apache license, 3 BSD )
• NWChem, ACES III, Psi4, PySCF, SMASH, OpenFMO,
ProteinDF, PAICS
2018.8.7
:
SMASH (Scalable Molecular Analysis Solver
for High-performance computing systems)
• https://sourceforge.net/projects/smash-qc/
• (Apache)
• ( )
•
• MPI/OpenMP
2018.8.7
NTChem
• http://molsc.riken.jp/ntchem_j.html
•
•
( )
• 2
• ( 3 )
• MPI/OpenMP
2018.8.7
HF & DFT
:
LDA, GGA, GGA, LC-GGA
(1+2 )
TDDFT
SCF
:
DIIS, 2 ,
SCF
,
(PDM, )
Order-N (GFC)
Resolution of Identity(RI)
Dual-level DFT
RI-MP2
(ONIOM, QM/MM)
(CC, MP2)
(ECP, MCP)
DKn, RESC, RA, –
NMR, EPR,
NTChem
2018.8.7
Hartree-Fock (Roothaan)
• Schrodinger
–
• 1 (Hartree-Fock )
– 1
– y ( ) :
Roothaan
2018.8.7
Ψ r1,r2,!,rn( ) ≈ ψ r1( )ψ r2( )!ψ rn( )
ˆHeffψp r1( )=εpψp r1( )
ψp (r) = Cµpφµ (r)
µ
∑
ˆHΨ r1,r2,!,rn( )= EΨ r1,r2,!,rn( )
y:
Molecular orbital (MO)
• ( )
• =
•
•
– :
•
•
– :
•
•
–
•
• 1 (= )
•
• , Wavelet ,
2018.8.7
χS
= Nxl
ym
zn
exp −ζr1( )
χG
= Nxl
ym
zn
exp −σr1
2
( )
Hartree-Fock
: ( ), , ,
S 1 Hcore
Fock F
Fock C ε
D
Fµν = Hµν + Dλσ µν λσ( )− 1
2 µσ λν( )⎡⎣ ⎤⎦
λσ
∑
Hartree-Fock
(µn|ls)
( 90% )
Fock
5%
2018.8.7
D
(Self-consistent field: SCF)
D
Hartree-Fock
• Fock
(90% )
• Fock
(µn|ls)
•
–
–
2018.8.7
H00rs
mn( )R R1 2
p
q
p L L= +B D
q L L= +A C
p+q n= +µ
H10rs
mn( )R R1 2
H20rs
mn( )R R1 2
H30rs
mn( )R R1 2
H40rs
mn( )R R1 2
H01rs
mn( )R R1 2
H02rs
mn( )R R1 2
H03rs
mn( )R R1 2
H04rs
mn( )R R1 2
H11rs
mn( )R R1 2
H12rs
mn( )R R1 2
H13rs
mn( )R R1 2
H21rs
mn( )R R1 2
H22rs
mn( )R R1 2
H31rs
mn( )R R1 2
ACE-RR
ACE
(pp|pp) :
LA=LB=LC=LD=1, m=0, n=1
1 2 1 2 1 2
2 1 2 2 1 2 2 1 2
1 1 1
( ) ( ) ( )
1 1 1
( ) ( ) ( )
pqrs pq rs p q rs
mn R R mn R R mn R R
pqrs pq rs p q rs
mnM R R mnM R R mnM R R
H H H
h h h
- + -
- + -
= -
= -
3
A B C D AB CD ABCD ABCD
4 3 4 3| C { } { }S S H
N
N Nl µ n x lµ nx lµnxf f f fé ù =ë û å
(accompanying coordinate expansion: ACE)
ACE-RR
•
(recurrence relation: RR)
• ACE
• ACE
:
2018.8.7
ACE-RR
(Pople-Hehre,
Dupuis-Rys-King)
Algorithm
Time [s] Time [s] Time [s] Time [s]
ACE-b3k3-RR (present) 134.4 (1.00) 914.6 (1.00) 409.4 (1.00) 697.2 (1.00)
Pople and Hehre 154.6 (1.15) 1161.7 (1.27) 437.9 (1.07) 776.7 (1.11)
Dupuis, Rys, and King 513.0 (3.82) 6447.9 (7.05) 617.1 (1.51) 1532.7 (2.20)
STO-3G STO-6G 3-21G 6-31G
MK, M. Kobayashi, H. Nakai and S.Nagase, J. Theor. Comput. Chem., 2005, 4, 139.
(C47H51NO14)
Hartree-Fock
2018.8.7
2018.8.7
1 1 1 1
1
2
3
4
: 1/4
•
( 1/4 )
•
=
Fock MPI
Icnt = 0
! m, n, r MPI
do m=1, nao
do n=1, m
do r=1, m
icnt++
if (mod(icnt, nproc)!=myrank) cycle !
do s=1, smax
Evaluation of AO integrals (mn|rs)
Update Fock matrix blocks
Fmn, Frs, Fmr, Fms, Fnr, Fns
using (mn|rs) and D matrix blocks
Drs, Dmn, Dns, Dnr, Dms, Dmr
enddo
enddo
enddo
enddo
call mpi_allreduce(F) ! Network communication: O(N2)
Fµν = Hµν + Dλσ µν λσ( )− 1
2 µσ λν( )⎡⎣ ⎤⎦
λσ
∑
Fock
Fock
• :
CPU
( )
⇒
2018.8.7
core core
core core
CPU
MPI/OpenMP
•
–
• OpenMP
• MPI
– : OpenMP
MPI
•
–
2018.8.7
MPI/OpenMP
2018.8.7
MPI OpenMP
MPI
core core
core core
CPU
core core
core core
CPU
core core
core core
CPU
core core
core core
CPU
core core
core core
CPU
core core
core core
CPU
MPI :
MPI/OpenMP :
Fock MPI/OpenMP
!$OMP parallel do schedule(dynamic,1) reduction(+:F)
! OpenMP : OpenMP
do m=nao, 1, -1
do n=1, m
! MPI : OpenMP
rstart=mod(mn+mpi_rank, nproc)+1
do r=rstart, m ,nproc
do s=1, r
Evaluation of AO integrals (mn|rs)
Update Fock matrix blocks
Fmn, Frs, Fmr, Fms, Fnr, Fns
using (mn|rs) and D matrix blocks
Drs, Dmn, Dns, Dnr, Dms, Dmr
enddo
enddo
enddo
enddo
call mpi_allreduce(F) ! Network communication: O(N2) K. Ishimura et al., J. Chem. Theory Comp. 2010, 6, 1075.
Fµν = Hµν + Dλσ µν λσ( )− 1
2 µσ λν( )⎡⎣ ⎤⎦
λσ
∑Fock
2018.8.7
•
• CPU
core core
core core
CPU
Hartree-Fock
2018.8.7
C60@C60H28 RHF/cc-pVDZ (1820 )
MPI/OpenMP
MPI/OpenMP (16384 )
Fock : 16.1 [ ]
Fock : 10.8 [ ]
(OpenMP ) 0
4096
8192
12288
16384
0 4096 8192 12288 16384
CPU
MPI/OpenMP Total
MPI/OpenMP SCF Fock
Flat MPI SCF Total
Flat MPI SCF Fock
Ideal
T. Nakajima, MK, M. Kamiya and Y. Nakatsuka,
Int. J. Quantum Chem., 2015, 115, 349–359.
(Møller-Plesset
(MP2) DFT )
2018.8.7
Møller-Plesset (MP2)
• Møller-Plesset : Hartree-Fock HF
– 0
–
• Møller-Plesset : 2
2018.8.7
E(2)
= −
ia jb( ) 2 ia jb( )− ib ja( )⎡
⎣
⎤
⎦
εa
+ εb
− εi
− ε jab
vir
∑
ij
occ
∑
ia jb( )= Cσ b
Cνa
Cλ j
µν λσ( )Cµi
µ
∑
λ
∑
ν
∑
σ
∑
E(0) + E(1) = Ψ(0) H0
Ψ(0) + Ψ(0) V Ψ(0) = EHF
E(2) = Ψ(0) V Ψ(1)
E(3) = Ψ(0) V Ψ(2)
Ψ = Ψ(0)
+ Ψ(1)
+ Ψ(2)
+ Ψ(3)
+
E = E(0) + E(1) + E(2) + E(3) +
E ≈ EHF + E(2)
H = H0
+ λV
Møller-Plesset (MP2)
•
• HF DFT
( )
• Size-consistency Size-extensivity
• (HF )
– HF
•
• HF DFT
– MP2 O(N5) VS HF DFT O(N3)
• O(N3 -O(N4
2018.8.7
MP2
E(2)
=
ia jb( ) 2 ia jb( )− ib ja( )"
#
$
%
εi +εi −εa −εbijab
∑
ia jb( )= Cσb Cνa Cλ j Cµi µν λσ( )
µ
∑
λ
∑
ν
∑
σ
∑
µν λσ( )= φµ (r1)φν (r1)
1
r1 − r2
φλ (r2 )φσ (r2 )dr1 dr2∫
2018.8.7
Hartree-Fock Cµi εi
MP2
(1) AO4
(2) MO : O(N5)
O(N4)
: O(N4)
: O(N4)
Resolution-of-idendity MP2 (RI-MP2)
2018.8.7
• MP2 4
Resolution-of-identity (RI)
• MP2 ( )
5-10 ( O(N5) )
•
: O(N3)
ia jb( )= Cσ b
Cνa
Cλ j
µν λσ( )Cµi
µ
∑
λ
∑
ν
∑
σ
∑ ia jb( )= Bn
ia
Bn
jb
n
∑ , Bn
ia
= l n( )
−1/2
Cνa
Cµi
µν l( )µ
∑
ν
∑
l
∑
O(N5)
RI
E(2)
=
ia jb( ) 2 ia jb( )− ib ja( )"
#
$
%
εi +εj −εa −εbijab
∑MP2
(2) MO Bn
ia
= l n( )
−1/2
Cνa Cµi µν l( )
µ
∑
ν
∑
l
∑
(1) AO3 µν l( )= χµ (r1)χµ (r1)r12
−1
ξl (r2 )dr1 dr2∫∫
Hartree-Fock Cµi εi
(3) MO ia jb( )= Bn
ia
Bn
jb
n
∑
: O(N5)
O(N3)
-
-
: O(N4)
: O(N4)
: O(N3)
2018.8.7
RI-MP2
2018.8.7
IoccBgn = Myrank*Nchank
IoccEnd = (Myrank+1)*Nchank
Do Iocc = IoccBgn, IoccEnd
↑MPI parallelization
<RI-MP2 calculations>
End Do
IvirBgn = Myrank*Nchank
IvirEnd = (Myrank+1)*Nchank
Do Ivir = IvirBgn, IvirEnd
↑MPI parallelization
<RI-MP2 calculations>
End Do
(ia | jb) = Bn
ia
Bn
jb
n
∑
:
MPI
:
MPI
(ia | jb) = Bn
ia
Bn
jb
n
∑
RI-MP2
MPI/OpenMP
MK and T. Nakajima, J. Chem. Comput. Theor., 9, 5373 (2013).
RI-MP2
2018.8.7
0
1
2
3
4
N-1
:
MPI
MPI ( =1 )
Evaluation of 3c-2e ERIs Bn
ia
= l n( )
−1/2
Cνa
Cµi
µν l( )µ
∑
ν
∑
λ
∑
Loop bProc
Sending
jb
nB to Myrank- bProc process
Receiving
jb
nB from Myrank+ bProc process
Loop a Myrank (MPI parallel)
Evaluation of 4c-2e MO ERIs
(ia | jb) = Bn
ia
Bn
jb
n
∑
by BLAS’s DGEMM (OpenMP parallel)
Evaluation of MP2 correlation energy E(2)
(OpenMP parallel)
End Loop a
End Loop bProc
MPI ( )
MK et al. J. Chem. Theory Comput., 2013, 9, 5373.
Tsubame
RI-MP2
2018.8.7
MPI
Tsubame
MPI
Tsubame
0/0
1/0
2/0
3/0
4/0
N-1/0
:
0/1
1/1
2/1
3/1
4/1
N-1/1
:
0/2
1/2
2/2
3/2
4/2
N-1/2
:
0/3
1/3
2/3
3/3
4/3
N-1/3
:
0
1
2
3
4
N-1
:
MPI MPI
( 1 )
MPI ( )
MK et al. J. Chem. Theory Comput., 2013, 9, 5373.
MPI ( )
MK et al. J. Comput. Chem., 2016, 37, 2623.
MPI ( 2 )
Tsubame
RI-MP2
2018.8.7
GPU
GPU
0/0
1/0
2/0
3/0
4/0
N-1/0
:
0/1
1/1
2/1
3/1
4/1
N-1/1
:
0/2
1/2
2/2
3/2
4/2
N-1/2
:
0/3
1/3
2/3
3/3
4/3
N-1/3
:
MPI
( 1 )
MPI ( )
MPI ( 2 )
GPU0
GPU1
GPU2
GPU3
RI-MP2 CUDA CPU-GPU
2018.8.7
Loop bProc (MPI parallel)
Sending
jb
nB to Myrank- bProc + 1 process
Sending
jb
nB from CPU to GPU
Receiving
jb
nB from Myrank+ bProc + 1 process
Sending
jb
nB from CPU to GPU
Loop a Myrank (MPI parallel)
Receiving 4c-2e integral (ia | jb)P from GPU to CPU
Allreduce 4c-2e integral
(ia | jb) = (ia | jb)P
P
∑
Evaluation of MP2 correlation energy E(2)
(OpenMP parallel)
End Loop a
End Loop bProc
• GPU CUDA
•
- CuBLAS
GPU
Evaluation of 4c-2e integral
(ia | jb)P
= Bn
ia
Bn
jb
n∈Myrank
∑
(CuBLASDGEMM)
: CPU
: GPU
• CPU-GPU PCI
• pinned memory PCI
1
2018.8.7
• P2P •
MPI
MPI
MPI
P2P
0
200
400
600
800
1,000
1,200
1,400
1,600
1,800
P2P ring
[]
Others
Step 3: EMP2 corr. eval.
Step 3: 4c MO 2-ERI eval.
Step 3: 3/3k 2c 2-ERI comm.
Step 2: 3/3 2c 2-ERI tran.
Step 2: 2/3 3c 2-ERI comm.
Step 2: 2c 2-ERI comm.
Step 2: 2c 2-ERI eval.
Step 1: 2/3 3c 2-ERI comm.
Step 1: 2/3 3c 2-ERI tran.
Step 1: 1/3 3c 2-ERI tran.
Step 1: 3c 2-ERI
2018.8.7
2 (C96H24)2 RI-MP2/cc-pVTZ
(6432 , 600 , 5832 ) 2048
2 MPI & MPI/OpenMP
RI-MP2
2018.8.7
CPU [ ] [PFLOPs] [%]
8911 71288 2692 8911 0.7 62
17822 142576 1627 14742 1.2 54
35644 285152 1095 21906 2.0 44
44555 356440 955 25112 2.4 42
53466 427728 881 27209 2.6 37
71288 570304 783 30656 2.9 32
80199 641592 759 31612 3.1 30
2 (C150H30)2 RI-MP2/cc-pVTZ (9840 , 930 , 8910 )
80,199 3.1 PFLOPs ( 30%)
0
17822
35644
53466
71288
0 17822 35644 53466 71288
0
500
1000
1500
2000
2500
1xCPU 1xCPU and
1xGPU
1xCPU and
2xGPUs
2xCPUs and
4xGPUs
Elapsedtime[sec]
Others
Step 3: EMP2 corr. comm.
Step 3: 4c MO 2-ERI eval.
Step 3: 3/3k 2c 2-ERI comm.
Step 2: 3/3 2c 2-ERI tran.
Step 2: 2/3 3c 2-ERI comm.
Step 2: 2c 2-ERI Inv.
Step 2: 2c 2-ERI eval.
Step 1: 2/3 3c 2-ERI comm.
Step 1: 2/3 3c 2-ERI tran.
Step 1: 1/3 3c 2-ERI tran.
Step 1: 3c 2-ERI eval.
- GPU
2018.8.7
4 GPU
GPU
CPU: Intel Xeon E5-2690 v2 (10 core) x 2, GPU: NVIDIA Tesla K40 x 4
(C24H12)2 RI-MP2/cc-pVTZ (888 , 78 , 810 , 2304 )
x2.2 X2.5
X4.8
TSUBAME2.5 RI-MP2
CPU VS. CPU/GPU
2018.8.7
0
128
256
384
512
0 128 256 384 512
CPU/GPU
CPU
Ideal
TSUBAME2.5 64-512 , CPU: Intel Xeon 5670 (6 ) x 2, GPU: NVIDIA Tesla K20X (3GPU/ )
CPU: 1MPI & 12 / , GPU: 3 MPI & 4 /
C96H24 RI-MP2/cc-pVTZ (3212 , 300 , 2916 , 8496 )
GPU
0
400
800
1200
64 128 256 512
[]
CPU/GPU
CPU
x4.1
x6.4
x6.6
x5.7
TSUBAME2.5 RI-MP2
CPU VS. CPU/GPU
2018.8.7
TSUBAME2.5 1349 , CPU: Intel Xeon 5670 (6 ) x 2, GPU: NVIDIA Tesla K20X (3GPU/ )
CPU: 1349 MPI & 12 , GPU: 4047 MPI (3 MPI / ) & 4
(C96H24)2 RI-MP2/cc-pVTZ (6432 , 600 , 5832 , 16992 )
0
500
1000
1500
2000
2500
3000
CPU CPU/GPU
[]
Others
EMP2 corr.
4c Ints comm.
4c Ints
3/3k 2cints comm
3/3 tran3c2 tran
2/3 tran3c2 comm
RIInt2_Inv2c
RIInt2c comm
RIInt2c calc
2/3 tran3c1 comm
2/3 tran3c1
1/3 tran3c1
3c-RIInt comm
3c-RIInt
x4.94c ints
- GPU
2047
87.5 TFLOPs
419
514.7 TFLOPs
RI-MP2
• RI-MP2
• MP2
• MP2
RI-MP2
• RI-MP2
2018.8.7
RI-MP2
• 1997 Weigend *
• MP2 4 MO
Resolution-of-identity (RI)
• MP2
•
• Turbomole Q-Chem ORCA
* F. Weigend, M. Häser, Theor. Chem. Acc., 97, 331 (1997).
1 1 , 1( ) ( ) ( )l l
l
cµ n µnc c x» år r r
2018.8.7
µν λσ( )= µν l( ) l m( )
−1
m λσ( )lm
∑
µν λσ( )
(x)
= µν l( )
(x)
l m( )
−1
m λσ( )lm
∑ + µν l( ) l m( )
−1
m λσ( )
(x)
lm
∑
− µν l( ) l m( )
−1
m n( )
(x)
n o( )
−1
o λσ( )lmno
∑
RI-MP2
RI-MP2
2
γ lm
MP2-NS
= Γia
l
Bia
n
m n( )
−1/2
nia
∑Γia
l
= 2Tij
ab
−Tji
ab
( )Bjb
n
n l( )
−1/2
jbn
∑
Pab
(2)
= 2Tij
ac
−Tji
ac
( )Tij
bc
ijc
∑
1
Γµνλσ
MP2-S
= 1
2
Pµν
RHF
+ Pµν
(2)
( )Pλσ
RHF
− 1
2
1
2
Pµλ
RHF
+ Pµλ
(2)
( )Pνσ
RHF
Bia
n
= l n( )
−1/2
Cνa Cµi µν l( )
µ
∑
ν
∑
l
∑ ia jb( )= Bia
n
Bjb
n
n
∑
3
Pij
(2)
= − 2Tik
ab
−Tki
ab
( )kab
∑ Tjk
ab
Tij
ab
=
(ia | jb)
εi
+ ε j
− εa
− εb
dEMP2
dx
= Pµν
MP2
Hµν
(x)
+ Wµν
MP2
Sµν
(x)
µν
∑ +
1
2
Γµνλσ
MP2-S
µν λσ( )
(x)
µν
∑ + 2 γ lm
MP2-NS
l m( )
(x)
lm
∑µν
∑ + 4 Γµν
l,MP2-NS
µν l( )
(x)
µν
∑
4
Lai
= 2 Γib
l
ab l( )bl
∑ − 2 Γ ja
l
ij l( )+ Aaibc
Pbc
(2)
bc
∑ + Aaijk
Pjk
(2)
jk
∑
jl
∑MP2
MP2
CPHF
occ.-occ. vir.-vir.
Pai
(2)
= Zai
occ.-vir.
Pµν
(2)
= Ppq
(2)
Cµp
Cνq
pq
∑Pµν
MP2
= Pµν
HF
+ Pµν
(2)
Γµν
l,MP2-NS
= Γia
l
Cµi
Cνa
ia
∑
Aaipq
Ppq
(2)
pq
∑ + εa
− εi( )Zai
= Lai
Wij
(2)
= 2 Γia
l
ja l( )
al
∑ −
1
2
Pij
(2)
εi + εj( )+ AijpqPpq
(2)
pq
∑
occ.-ccc. vir.-vir.
Wab
(2)
= −2 Γia
n
ib n( )
an
∑ −
1
2
Pab
(2)
εa + εb( ) Wai
(2)
= −2 Γ ja
n
ij n( )
ja
∑ − Pai
(2)
εi
occ.-vir.
Apqrs
= 4 pq rs( )− ps rq( )− pr sq( )
RI
2018.8.7
Evaluate 2p DM Yia
n
+ = 2Tij
ab
!Tji
ab
( )Bjb
n
jb
"
End Loop bProc
Evaluate 1p DM Pab
(2)
+ = 2Tij
ac
! Tji
ac
( )Tij
bc
ij
"
Evaluate 2p DM !ia
l
= Yia
n
n l( )
"1/2
n
#
Store !ia
l
to distributed memory
Evaluate 2p DM Xln
+ = !ia
l
Bia
n
ia
"
End Loop a
Allreduce E(2)
, Pij
(2)
, Pab
(2)
, and Xln
Evaluate 2p DM ! ln
= Xlm
m n( )
"1/2
n
#
Evaluate non-separable part of MP2 gradient
dEMP2
dx
+ = 2 ! lm
l m( )
(x)
lm
!
[Step 1: Evaluate 3c integrals]
Evaluate Bia
n
= l n( )
!1/2
C!a Cµi µ! l( )
µ
"!
"l
"
Store Bia
n
to distributed memory
[Step 2: Evaluate 1p & 2p MP2 density Matrix (DM) and
part of gradient]
Loop a Myrank (MPI parallel)
Loop bProc = 1, NProc
Sending
jb
nB to Myrank- bProc process
Receiving
jb
nB from Myrank+ bProc process
Evaluate 4c-2e integral (ia | jb) = Bia
n
Bjb
n
n
!
Evaluate Tij
ab
=
(ia | jb)
!i
+ ! j
! !a
! !b
Evaluate MP2 correlation energy E(2)
Evaluate 1p MP2 DM Pij
(2)
+ = ! 2Tik
ab
!Tki
ab
( )k
! Tjk
ab
MPI/OpenMP
RI-MP2 (1)
• MPI :
• 2 :
MPI
• OpenMP : MPI
• - BLAS DGEMM
: MPI
: OpenMP
MPI/OpenMP
:
3
2018.8.7
O(N5) BLAS DGEMM
[Step 3: Evaluate part of MP2 Lagrangian and gradient]
Loop bProc = 1, NProc
Sending !ia
l
to Myrank- bProc process
Receiving !ia
l
from Myrank+ bProc process
End Loop bProc
Loop L Myrank (MPI Parallel)
Evaluate MP2 Lagrangian Laq
3
+ = !ia
l
C"q
Cµi
µ" l( )µ
#
"
#
i
#
Evaluate MP2 Lagrangian Liq
4
+ = !ia
l
C!q
Cµa
µ! l( )µ
"!
"i
"
Evaluate 1p energy weighted DM
Wij
(2)
[I]+ = 2 !ia
l
ja l( )i
"
Evaluate non-separable part of MP2 gradient
dEMP2
dx
+ = 4 !µ!
l
µ! l( )
(x)
µ!
"
End Loop L
Allreaduce Laq
3
, Liq
4
, and , Wij
(2)
[I]
[Step 4: Evaluate part of MP2 Lagrangian]
Lai
1,2
= Cµi
C! j
2 µ! "#( )$ µ" !#( )%
&
'
( )P"#
(2)
"#
*
!
*
µ
*
[Step 5: Iteratively Solve CPHF equation]
Loop iter
Evaluate
Gai
= Cµa
C!i
4 µ! "#( )! µ! "#( )! µ! "#( )"
#
$
%
&&P!"
(2)
!"
'#
'µ
'
Solve CPHF equation Gai
+ !a
" !i( )Pai
(2)
= Lai
End Loop iter
[Step 6: Evaluate part of MP2 energy weighted DM]
Wij
(2)
[III] = Cµi
C! j
2 µ! "#( )! µ! "#( )"
#
$
%
P!"
(2)
!"
&#
&µ
&
[Step 7: Evaluate separable part of MP2 gradient]
dEMP2
dx
+ = Pµ!
MP2
Hµ!
(x)
+ Wµ!
MP2
Sµ!
(x)
µ!
! +
1
2
"µ!"#
MP2-S
µ! "#( )
(x)
µ!
!µ!
!
2 Fock MPI/OpenMP
AO
MPI/OpenMP
RI-MP2 : (2)
3 2
1 1
: MPI
: OpenMP
MPI/OpenMP
:
2018.8.7
RI-MP2
2018.8.7
(PDB ID: 1L2Y) RI-MP2/def2-SVP
(304 , 2906
512-4,096
CPU [ ] [%]
512 4096
1024 8192
2048 16384
4096 32768
8192
12288 98304
0
4096
8192
12288
0 4096 8192 12288
Speedups
Ideal
1 12,288
6
MK and T. Nakajima, J. Comput. Chem., 2017, 38, 489.
RI-MP2
2018.8.7
RI-MP2/def2-SVP
(304 , 2906
512-4,096
3,072 RI-MP2
37 27 ( 86 )
: Trp-cage 1L2Y PDB
( )
:
MK and T. Nakajima, J. Comput.
Chem., 2017, 38, 489.
(LR-TDDFT)
•
NTChem, GELLAN, Smash,
NWChem
•
•
(LR-TDDFT)
2018.8.7
(TDDFT)
•
• 1
2018.8.7
−
1
2
∇2
+υne
R,r( )+
ρ r,t( )
r − #r
d #r∫ +υxc
r,t( )+υ r,t( )
%
&
'
'
(
)
*
*
ψ r,t( )= i
∂
∂t
ψ r,t( )( )
( )
( ) ( ) ( )
1
,
2
ne xc i i id
r
u u e
é ù¢
¢- Ñ + + + Y = Yê ú
¢-ë û
ò
r
R r r r r r
r r
( )1r r( )1 2, , , NY r r r! ( )1,tr r( )1 2, , , ,N tY r r r!
(DFT) (TDDFT)
Kohn-Sham Kohn-Sham
(LR-TDDFT)
• TDDFT 1
LR-TDDFT
• ( O(N5-) )
O(N3)
•
1000
2018.8.7
A B
B*
A*
!
"
##
$
%
&&
X
Y
!
"
#
$
%
& =ω 1 0
0 −1
!
"
#
$
%
&
X
Y
!
"
#
$
%
&
)||()|()|(
)||()|()|(
)(
,
,
ttstsssttsttssss
ttstsssttsstttss
tsstss
d
eeddd
jbwiacibjacjbiaB
bjwiacijbacbjia
A
DFTHFbjai
DFTHF
iaabijbjai
+-=
+-
+-=
Casida
w:
LR-TDDFT
2018.8.7
t
AO
g+=(A+B)t, g-=(A-B)t
(AR-BR)1/2 (AR+BR) (AR-BR)1/2 ZR=w2ZR
WL=(A+B)R-wL, WR=(A-B)L-wR
t
W
W
90%
: O(N4)
SCF MPI/OpenMP
MPI/OpenMP
OpenMP
2, 3BLAS
t
AR+BR=t+g+, AR-BR=t+ g
-
AO Davidson
ZR:
L=|X-Y>
&
R =|X+Y>
OpenMP
Bµν
q!
"
#
$
= 2 µν λσ( )−cx
µλ νσ( )+cx
LR
µλ νσ( )
LR
+ fµν ,λσ
xc!
"&
#
$'tλσ
q!
"
#
$
MK and T. Nakajima, J. Comput.
Chem., 2017, 38, 489.
Davidson
2018.8.7
t
4
8
12
1
5
9
13
2
6
10
14
3
7
11
15
16
20
24
28
17
21
25
29
18
22
26
30
19
23
27
31
4
8
12
1
5
9
13
2
6
10
14
3
7
11
15
16
20
24
28
17
21
25
29
18
22
26
30
19
23
27
31
g+
2
1: 1
(i→a)2
(i→a)2
Davidson
2018.8.7
t
+=
(i→a)
(i→a)
4
8
12
1
5
9
13
2
6
10
14
3
7
11
15
16
20
24
28
17
21
25
29
18
22
26
30
19
23
27
31
4
8
12
1
5
9
13
2
6
10
14
3
7
11
15
16
20
24
28
17
21
25
29
18
22
26
30
19
23
27
31
0
g+ AR+BR
-
Davidson
2018.8.7
t
+=
Broadcast -(i→a)
(i→a)
4
8
12
1
5
9
13
2
6
10
14
3
7
11
15
16
20
24
28
17
21
25
29
18
22
26
30
19
23
27
31
4
8
12
1
5
9
13
2
6
10
14
3
7
11
15
16
20
24
28
17
21
25
29
18
22
26
30
19
23
27
31
MPI Broadcast
g+ AR+BR
Davidson
2018.8.7
t
+=
Broadcast -(i→a)
(i→a)
4
8
12
1
5
9
13
2
6
10
14
3
7
11
15
16
20
24
28
17
21
25
29
18
22
26
30
19
23
27
31
4
8
12
1
5
9
13
2
6
10
14
3
7
11
15
16
20
24
28
17
21
25
29
18
22
26
30
19
23
27
31
1
g+ AR+BR
AO
MPI/OpenMP
!$OMP parallel do schedule(dynamic,1) reduction(+:F)
do m=nao, 1, -1
do n=1, m
! MPI parallel
rstart=mod(mn+mpi_rank, nproc)+1
do r=rstart, m ,nproc
do s=1, r
Evaluation of AO integrals (mn|rs)
Update Fock-like matrix blocks
Bq
mn, Bq
rs, Bq
mr, Bq
ms, Bq
nr, Bq
ns
using (mn|rs) and trial vector matrix tq
tq
rs, tq
mn, tq
ns, tq
nr, tq
ms, tq
mr
enddo
enddo
enddo
enddo
call mpi_allreduce(B) ! Network communication: O(N2)
OpenMP AO
MPI AO
Kµσν !σ
q"
#
$
%
= 2δσ !σ
µν λρ( )−cx
µλ νρ( )+cx
LR
µλ νρ( )
LR"
#'
$
%(tλσρ !σ
q"
#
$
%
s: {a, b}
µ, n:
q:
2018.8.7
Davidson
-
• - -
• 2 3 BLAS
• - DGEMM
-
DGEMM - DGEMV
Q = q1
!qjbgn( )
R = ajbgn+1
!an( )
S = QT
R
R = R −QS
Loop j = jbgn, n
T = rjbgn
!rj−1( )
u = TT
rj
rj
= rj
−Tu
rj
= rj
/ rj
End Loop j
Loop j = 1, n
qj
= aj
Loop i = 1, j-1
qj
= qj
− qj
•ai( )qi
End Loop i
qj
= qj
/ qj
End Loop j
jbgn
n
2018.8.7
LR-TDDFT
(PDB ID: 1CRN) TDDFT B3LYP/def-2SVP,
=20 (642 , 6177 , 1260 , 4917 )
7680 2353 218
2018.8.7
CPU [ ]
768 6144
1536 12288
2304 18432
3072 245760
1536
3072
4608
6144
7680
0 1536 3072 4608 6144 7680
3
2018.8.7
2018.8.7
0
5 0
1 00
1 50
2 00
0 1 2 3 4 5 6 7 8 9 1 0
N
O(N3)
(Hartree-Fock, DFT)
•
•
0
5 0
1 00
1 50
2 00
0 1 2 3 4 5 6 7 8 9 1 0
N
O(N3)
O(N)
(Hartree-Fock, DFT)
O(N)
2018.8.7
•
– RSDFT ( )
– ProteinDF, NTChem, SMASH ( )
•
– purification
•
–
–
–
2018.8.7
• Rene Descartes 1596 – 1650
–
–
–
• 1
–
• (Discours de la méthode) 2
–
(Le second était de diviser chacune des difficultés que
j‘examinais en autant de parcelles qu’il se pourrait et qu‘il
serait requis pour mieux les résoudre
2018.8.7
:
•
•
•
2018.8.7
• (
)
•
•
•
•
•
•
•
– GAMESS, ABINIT-MP, OpenFMO, PAICS
•
– GAMESS
•
– GAMESS
•
2018.8.7
(FMO)
( )I IJ I J
I I J
E E E E E
>
= + - -å å
EI
EIJ
2018.8.7
:
:
1
n1
n
E1 En
E = EI
I
∑ + EIJ − EI − EJ( )
I>J
∑
2018.8.7
FMO-RI-MP2
1 1 † 1
( | ) ( )ln nm
n
l m L L- - -
= å
1
( | ) ( | )( | ) ( | )
lm
l l m mµn ls µn ls-
» å
1 1 1 ,( ) ( ) ( ) m
m
m cµ n µnc c » år r r
Resolution-of-identity (RI)
FMO-MP2
⇒
RI
Resolution-of-identity (RI) FMO-RI-MP2
3 ⇒
o:
v:
n:
N :
3
⇒
( ) i a j b
n n
n
i a j b B B
a a a a
a a a a
= å
1
( | )nl
i a
n i a
l
B L C l C
a a
a a
µ n
µ n
µn-
= å å å1
( | )nl
i a
n a
l
B L l C
a a
a
n
n
µn-
= å å
2018.8.7
MK Theor. Chem. Acc., 2011, 130, 449–453.
FMO RI-MP2
:
968 15,719
2 /1
(484 )
FMO2 RI-MP2/6-31G*
12288 86016 0.82
GAMESS-FMO-MP2 MPI/OpenMP
( )
2018.8.7
MK, T. Nakajima, and S. Nagase
Proceedings of JSST 2012 338-343
(Divide-and-Conquer: DC)
⇒
W. Yang, Phys. Rev. Lett., 66, 1438 (1991).
2018.8.7
W. Yang
• DC-DFT : W. Yang DFT O(N)
• DC-semi-empirical MO : W. Yang
( )
• DC-Hartree Fock : W. Yang DC-DFT ( ),
• DC-MP2 : ,
• DC-coupled cluster ,
• DC-TD-DFT : ,
• DC-DFTB (density functional tight binding) : ( ),
, ,
2018.8.7
DC-MP2
MP2
µ S(a)
MP2
MP2
µ S(a)
MP2
( )
occ( ) vir( )
corr
, , ( )
2i iajb ibja
i j a b S
E C a j b t t
a a
a a a a a a a
µ
µ a
µ
Î
é ù= -ë ûå å å ! !
subsystem
corr corrE Ea
a
» å
a MO i
( )
iajb
i j a b
i a j b
t
a a a a
a
a a a a
e e e e
=
+ - -
!
M. Kobayashi, Y. Imamura, and H. Nakai, J. Chem. Phys., 127. 074103, (2007).
2018.8.7
DC-MP2
SCF MP2
nDC-HF
SCF
HF
⇒
nDC-MP2
SCF HF
DC-MP2
n
MP2 HF
n
MP2≫HF
C60H62 DC-MP2/6-31G
M. Kobayashi and H. Nakai, Int. J. Quantum. Chem., 109, 2227 (2009).
2018.8.7
DC-MP2
1
corrEa =
corr
n
Ea =
subsystem
corr corrE Ea
a
» å
0
n-10
n-1
2018.8.7
MK, M. Kobayashi, H. Nakai, and S. Nagase, J. Comput. Chem., 2011, 32, 2756.
DC-MP2
1
CALL GDDI_SCOPE(DDI_GROUP):
CALL GDDICOUNT(-1, MYJOB):
EMP2TOT← 0
Loop isub=1, nsub: GDDI
CALL GDDICOUNT(0, MYJOB): GDDI
If (MYJOB = TRUE) Then
MP2 EMP2SUB
EMP2TOT← EMP2TOT + EMP2SUB MP2 EMP2TOT
End If
End Loop
CALL GDDICOUNT(1, MYJOB):
CALL GDDI_SCOPE(DDI_MASTERS):
CALL DDI_GSUMF(EMP2TOT ): MP2 EMP2TOT
CALL GDDI_SCOPE(DDI_WORLD):
MK, M. Kobayashi, H. Nakai, and S. Nagase, J. Comput. Chem., 2011, 32, 2756.
2018.8.7
DC-MP2
64
: T2K-Tsukuba, : 6-31G**
b- 20
2018.8.7
Density functional tight-binding (DFTB )
• DFT
•
• 4
•
( O(N3))
DC-DFTB : DC
2018.8.7
1 :
• :
– , ,
– : (2015 2 )
– ISBN: 978-4-13-063455-7
–
• HPC 1
– Jaewoon Jung
– : (2017 4 )
– ISBN: 978-4-87259-586-4
– HPC
• HPC 2
–
– : (2017 3 )
– ISBN: 978-4-87259-587-1
– HPC
2018.8.7
2 : 3 :
• ― ( )
–
– : (2002 7 )
– ISBN: 978-4062573757
–
• 17
– ,
– : (2002 2 )
– ISBN: 978-4000110471
–
• (KS )
–
– : (2002 2 )
– ISBN: 978-4061543881
– Q&A
5 :
• !: 100
–
– : (2014 2 )
– ISBN: 978-4098251988
–
2018.8.7

Más contenido relacionado

Similar a 材料科学とスーパーコンピュータ: 基礎編

TensorFlow 2: New Era of Developing Deep Learning Models
TensorFlow 2: New Era of Developing Deep Learning ModelsTensorFlow 2: New Era of Developing Deep Learning Models
TensorFlow 2: New Era of Developing Deep Learning ModelsJeongkyu Shin
 
JTF2018_B30_k8s_operator_nobusue
JTF2018_B30_k8s_operator_nobusueJTF2018_B30_k8s_operator_nobusue
JTF2018_B30_k8s_operator_nobusueNobuhiro Sue
 
ppOpen-AT : Yet Another Directive-base AT Language
ppOpen-AT : Yet Another Directive-base AT LanguageppOpen-AT : Yet Another Directive-base AT Language
ppOpen-AT : Yet Another Directive-base AT LanguageTakahiro Katagiri
 
技術とデザインの最適な関係; 技術の意味を与えるデザイン
技術とデザインの最適な関係; 技術の意味を与えるデザイン技術とデザインの最適な関係; 技術の意味を与えるデザイン
技術とデザインの最適な関係; 技術の意味を与えるデザインTohru Yoshioka-Kobayashi
 
20170322_ICON21技術セミナー1_加藤
20170322_ICON21技術セミナー1_加藤20170322_ICON21技術セミナー1_加藤
20170322_ICON21技術セミナー1_加藤ICT_CONNECT_21
 
20170322_ICON21技術セミナー1_加藤
20170322_ICON21技術セミナー1_加藤20170322_ICON21技術セミナー1_加藤
20170322_ICON21技術セミナー1_加藤ICT_CONNECT_21
 
Attention-Based Adaptive Selection of Operations for Image Restoration in the...
Attention-Based Adaptive Selection of Operations for Image Restoration in the...Attention-Based Adaptive Selection of Operations for Image Restoration in the...
Attention-Based Adaptive Selection of Operations for Image Restoration in the...MasanoriSuganuma
 
MATHEMATICAL MODELING OF COMPLEX REDUNDANT SYSTEM UNDER HEAD-OF-LINE REPAIR
MATHEMATICAL MODELING OF COMPLEX REDUNDANT SYSTEM UNDER HEAD-OF-LINE REPAIRMATHEMATICAL MODELING OF COMPLEX REDUNDANT SYSTEM UNDER HEAD-OF-LINE REPAIR
MATHEMATICAL MODELING OF COMPLEX REDUNDANT SYSTEM UNDER HEAD-OF-LINE REPAIREditor IJMTER
 
Fast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in PracticeFast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in PracticeRakuten Group, Inc.
 
11 two warehouse production inventory model with different deterioration rate...
11 two warehouse production inventory model with different deterioration rate...11 two warehouse production inventory model with different deterioration rate...
11 two warehouse production inventory model with different deterioration rate...BIOLOGICAL FORUM
 
What we got from the Predicting Red Hat Business Value competition
What we got from the Predicting Red Hat Business Value competitionWhat we got from the Predicting Red Hat Business Value competition
What we got from the Predicting Red Hat Business Value competitionUmaporn Kerdsaeng
 
Physique révision
Physique révisionPhysique révision
Physique révisionbadro96
 
Model reduction design for continuous systems with finite frequency specifications
Model reduction design for continuous systems with finite frequency specificationsModel reduction design for continuous systems with finite frequency specifications
Model reduction design for continuous systems with finite frequency specificationsIJECEIAES
 
[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image Synthesis
[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image Synthesis[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image Synthesis
[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image SynthesisDeep Learning JP
 
katagaitai CTF workshop #10 AESに対する相関電力解析
katagaitai CTF workshop #10 AESに対する相関電力解析katagaitai CTF workshop #10 AESに対する相関電力解析
katagaitai CTF workshop #10 AESに対する相関電力解析trmr
 
Theory to consider an inaccurate testing and how to determine the prior proba...
Theory to consider an inaccurate testing and how to determine the prior proba...Theory to consider an inaccurate testing and how to determine the prior proba...
Theory to consider an inaccurate testing and how to determine the prior proba...Toshiyuki Shimono
 
Japan Lustre User Group 2014
Japan Lustre User Group 2014Japan Lustre User Group 2014
Japan Lustre User Group 2014Hitoshi Sato
 

Similar a 材料科学とスーパーコンピュータ: 基礎編 (20)

TensorFlow 2: New Era of Developing Deep Learning Models
TensorFlow 2: New Era of Developing Deep Learning ModelsTensorFlow 2: New Era of Developing Deep Learning Models
TensorFlow 2: New Era of Developing Deep Learning Models
 
JTF2018_B30_k8s_operator_nobusue
JTF2018_B30_k8s_operator_nobusueJTF2018_B30_k8s_operator_nobusue
JTF2018_B30_k8s_operator_nobusue
 
ppOpen-AT : Yet Another Directive-base AT Language
ppOpen-AT : Yet Another Directive-base AT LanguageppOpen-AT : Yet Another Directive-base AT Language
ppOpen-AT : Yet Another Directive-base AT Language
 
技術とデザインの最適な関係; 技術の意味を与えるデザイン
技術とデザインの最適な関係; 技術の意味を与えるデザイン技術とデザインの最適な関係; 技術の意味を与えるデザイン
技術とデザインの最適な関係; 技術の意味を与えるデザイン
 
20170322_ICON21技術セミナー1_加藤
20170322_ICON21技術セミナー1_加藤20170322_ICON21技術セミナー1_加藤
20170322_ICON21技術セミナー1_加藤
 
20170322_ICON21技術セミナー1_加藤
20170322_ICON21技術セミナー1_加藤20170322_ICON21技術セミナー1_加藤
20170322_ICON21技術セミナー1_加藤
 
Attention-Based Adaptive Selection of Operations for Image Restoration in the...
Attention-Based Adaptive Selection of Operations for Image Restoration in the...Attention-Based Adaptive Selection of Operations for Image Restoration in the...
Attention-Based Adaptive Selection of Operations for Image Restoration in the...
 
Prelude to halide_public
Prelude to halide_publicPrelude to halide_public
Prelude to halide_public
 
MATHEMATICAL MODELING OF COMPLEX REDUNDANT SYSTEM UNDER HEAD-OF-LINE REPAIR
MATHEMATICAL MODELING OF COMPLEX REDUNDANT SYSTEM UNDER HEAD-OF-LINE REPAIRMATHEMATICAL MODELING OF COMPLEX REDUNDANT SYSTEM UNDER HEAD-OF-LINE REPAIR
MATHEMATICAL MODELING OF COMPLEX REDUNDANT SYSTEM UNDER HEAD-OF-LINE REPAIR
 
Fast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in PracticeFast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in Practice
 
vcdで日本語(3) long format が旧世界とのGateway
vcdで日本語(3) long format が旧世界とのGatewayvcdで日本語(3) long format が旧世界とのGateway
vcdで日本語(3) long format が旧世界とのGateway
 
ALPSチュートリアル
ALPSチュートリアルALPSチュートリアル
ALPSチュートリアル
 
11 two warehouse production inventory model with different deterioration rate...
11 two warehouse production inventory model with different deterioration rate...11 two warehouse production inventory model with different deterioration rate...
11 two warehouse production inventory model with different deterioration rate...
 
What we got from the Predicting Red Hat Business Value competition
What we got from the Predicting Red Hat Business Value competitionWhat we got from the Predicting Red Hat Business Value competition
What we got from the Predicting Red Hat Business Value competition
 
Physique révision
Physique révisionPhysique révision
Physique révision
 
Model reduction design for continuous systems with finite frequency specifications
Model reduction design for continuous systems with finite frequency specificationsModel reduction design for continuous systems with finite frequency specifications
Model reduction design for continuous systems with finite frequency specifications
 
[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image Synthesis
[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image Synthesis[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image Synthesis
[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image Synthesis
 
katagaitai CTF workshop #10 AESに対する相関電力解析
katagaitai CTF workshop #10 AESに対する相関電力解析katagaitai CTF workshop #10 AESに対する相関電力解析
katagaitai CTF workshop #10 AESに対する相関電力解析
 
Theory to consider an inaccurate testing and how to determine the prior proba...
Theory to consider an inaccurate testing and how to determine the prior proba...Theory to consider an inaccurate testing and how to determine the prior proba...
Theory to consider an inaccurate testing and how to determine the prior proba...
 
Japan Lustre User Group 2014
Japan Lustre User Group 2014Japan Lustre User Group 2014
Japan Lustre User Group 2014
 

Último

Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfrohankumarsinghrore1
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 

Último (20)

Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 

材料科学とスーパーコンピュータ: 基礎編

  • 2. • 8/7 : – 1 : • • • – 2 : • – – • – Hartree-Fock – Post Hartree-Fock : MP2 – : (TD-DFT) – 3 : • • 2018.8.7
  • 3. • 8/8 : – 4 : • • • PS2 • – 5 : • • ( ) • 2018.8.7
  • 5. • = – ( 1000 ) • (*) – 50 TFLOPs – Tera FLoating Operation Per second: 1 1 – 1 50 (*) 2018 8 2018.8.7
  • 6. 2018.8.7 1923 1946 ENIAC 1976 CRAY-1 2002 2011 11 PFLOPs 160 MFLOPs 40 TFLOPs 6250 25 2018.8.7'8 6 1923 1946 ENIAC 1976 CRAY/1 2002 2011 11)PFLOPs 160)MFLOPs 40)TFLOPs 6250 25
  • 8. Top 500 List • – https://www.top500.org/ • LINPACK – • 2018.8.7
  • 9. Top 500 List 2018.8.7 Rmax (PFLOPs) Rpeak (PFLOPs) 1 Summit USA 2018 122.300 187.659 2 Sunway TaihuLight 2016 93.015 125.436 3 Sierra USA 2018 71.610 119.194 4 Tianhe-2A 2013 61.445 100.679 5 AI Bridging Cloud Infrastructure 2018 19.880 32.577 6 Piz Daint 2016 19.590 25.326 7 Titan USA 2012 17.590 27.112 8 Sequoia USA 2013 17.173 20.133 9 Trinity USA 2015 14.137 43.903 10 Cori NERSC USA 2016 14.014 27.881 11 Nurion Korea Institute of Science and Technology Information 2018 13.929 25.706 12 Oakforest-PACS HPC 2016 13.555 24.914 16 2011 10.510 11.280
  • 10. 100PFLOPs CPU GPU GPU Top500 2018 6 1 HPCG 2018 6 1 CPU IBM POWER 9 2 CPU 1.08 TFLOPs (24.56 GF x 22 x 2) CPU 256 GB GPU NVIDIA Tesla V100 6 GPU 46.8 TFLOPs (7.8 TF x 6 ) GPU 96 GB 256 4608 Infiniband EDR 122.3 PFLOPs 10 PB 250 PB Summit : TOP500 Web 2018.8.7
  • 11. Top500 2011 6 ,10 2 1 Graph500 2014 6 2015 7 6 1 HPCG 2014 11 4 2 2016 11 3 1 CPU SPARC64™ VIIIfx 2GHz CPU 128 GFLOPs (16 GF x 8 cores) 16 GB 864 82,944 Tofu (6D Mesh/Torus) 10.62 PFLOPs 1.26 PB Fujitsu Exabyte File System (FEFS) 30 PB 2018.8.7
  • 12. • R-CCS – http://www.r- ccs.riken.jp/jp/outreach/videogallery.html • – https://www.youtube.com/watch?v=_ze51XkKd_I 2018.8.7
  • 15. • R-CCS – http://www.r-ccs.riken.jp/jp/outreach/videogallery.html • – https://www.youtube.com/watch?v=JqNj3YOfyMY&feature=youtu.be • – https://www.youtube.com/watch?v=2JU2LjPDrQY • – https://www.youtube.com/watch?v=Tx1RHU7Zw2c • UT-Heart – https://www.youtube.com/watch?v=tBdFv28EEq0 • – https://www.youtube.com/watch?v=w6G5TMTE-z4 • – https://www.youtube.com/watch?v=DqeEgG52AZY • – https://www.youtube.com/watch?v=6nr13DMgU3A • - – https://www.youtube.com/watch?v=1F6L8rgJT8w 2018.8.7
  • 16. 2018.8.7 core core core core CPU core core core core CPU core core core core CPU core core core core CPU • CPU • - : : 8 •
  • 18. 2018.8.7 1 1 1 1 1 2 3 4 : 1/4 • ( 1/4 ) • =
  • 19. 2018.8.7 (1 ) (99 ) 100 1+0.99=1.99 50 1000 1+0.099=1.099 91
  • 20. 2018.8.7 (0.1 ) (99.9 ) 100 0.1+0.999=1.099 90 1000 0.1+0.0999=0.1999 500 • •
  • 21. • OpenMP (CPU ), OpenACC (GPU) CPU • MPI( ), CUDA (GPU) OpenMP, OpenACC • : BLAS, LAPACK, FFTW ( , GPU) x • ( ) 2018.8.7
  • 22. MPI/OpenMP 2018.8.7 MPI OpenMP MPI core core core core CPU core core core core CPU core core core core CPU core core core core CPU core core core core CPU core core core core CPU MPI : MPI/OpenMP :
  • 23. MPI/OpenMP • – • OpenMP • MPI – : OpenMP MPI • – 2018.8.7
  • 25. “ : quantum chemistry ” Wikipedia 2018.8.7
  • 26. • • Schrödinger Dirac • Schrödinger Dirac 2018.8.7 "The fundamental laws necessary for the mathematical treatment of large parts of physics and the whole chemistry are thus fully known, and the difficulty lies only in the fact that application of these laws leads to equations that are too complex to be solved.” - Paul Dirac, Proc. Roy. Soc. London , A123, 714 1929 HΨ = EΨ H = − ∇A 2 2MA N ∑ − ∇i 2 2A n ∑ + ZAZB RABA>B ∑ − ZA rAiA,i ∑ + 1 riji> j ∑ Ψ ≡ Ψ r1,r2,,rn;R1,R2,,RN( ) Erwin SchrödingerEH BA AB BA ji ijiA Ai A n i i N A A R ZZ rr Z M H ,, 22 1 22 ),,,;,,,( 2121 Nn RRRrrr e2/rij ZAe2/riA i j A B ZBe2/rjB ZA ZB e2/RAB N n ?! ?!
  • 27. • Schrödinger Dirac – (Born-Oppenheimer ) – – 1 Slater – 1 – ( ) – 2018.8.7
  • 28. • Hartree-Fock HF – – – • (Density functional theory: DFT) – – 1 – HF • (Post-HF ) – HF – • Møller-Plesset (MP) • (Coupled-Cluster(CC) ) • (CI) 2018.8.7
  • 29. 2018.8.7 • • • N 3 : O(N3) • … Hartree- Fock(HF) (DFT) MP2 ( ) CCSD ( ) CCSD(T) ( ) O(N3) O(N3) O(N5) O(N6) O(N7)
  • 30. • • Gaussian, Q-Chem, Turbomole, MOLPRO, Molcas, ADF • – • ORCA, Firefly, NTChem – ( ) • GAMESS, DIRAC, ABINIT-MP – (GPL, Apache license, 3 BSD ) • NWChem, ACES III, Psi4, PySCF, SMASH, OpenFMO, ProteinDF, PAICS 2018.8.7 :
  • 31. SMASH (Scalable Molecular Analysis Solver for High-performance computing systems) • https://sourceforge.net/projects/smash-qc/ • (Apache) • ( ) • • MPI/OpenMP 2018.8.7
  • 33. HF & DFT : LDA, GGA, GGA, LC-GGA (1+2 ) TDDFT SCF : DIIS, 2 , SCF , (PDM, ) Order-N (GFC) Resolution of Identity(RI) Dual-level DFT RI-MP2 (ONIOM, QM/MM) (CC, MP2) (ECP, MCP) DKn, RESC, RA, – NMR, EPR, NTChem 2018.8.7
  • 34. Hartree-Fock (Roothaan) • Schrodinger – • 1 (Hartree-Fock ) – 1 – y ( ) : Roothaan 2018.8.7 Ψ r1,r2,!,rn( ) ≈ ψ r1( )ψ r2( )!ψ rn( ) ˆHeffψp r1( )=εpψp r1( ) ψp (r) = Cµpφµ (r) µ ∑ ˆHΨ r1,r2,!,rn( )= EΨ r1,r2,!,rn( ) y: Molecular orbital (MO)
  • 35. • ( ) • = • • – : • • – : • • – • • 1 (= ) • • , Wavelet , 2018.8.7 χS = Nxl ym zn exp −ζr1( ) χG = Nxl ym zn exp −σr1 2 ( )
  • 36. Hartree-Fock : ( ), , , S 1 Hcore Fock F Fock C ε D Fµν = Hµν + Dλσ µν λσ( )− 1 2 µσ λν( )⎡⎣ ⎤⎦ λσ ∑ Hartree-Fock (µn|ls) ( 90% ) Fock 5% 2018.8.7 D (Self-consistent field: SCF) D
  • 37. Hartree-Fock • Fock (90% ) • Fock (µn|ls) • – – 2018.8.7
  • 38. H00rs mn( )R R1 2 p q p L L= +B D q L L= +A C p+q n= +µ H10rs mn( )R R1 2 H20rs mn( )R R1 2 H30rs mn( )R R1 2 H40rs mn( )R R1 2 H01rs mn( )R R1 2 H02rs mn( )R R1 2 H03rs mn( )R R1 2 H04rs mn( )R R1 2 H11rs mn( )R R1 2 H12rs mn( )R R1 2 H13rs mn( )R R1 2 H21rs mn( )R R1 2 H22rs mn( )R R1 2 H31rs mn( )R R1 2 ACE-RR ACE (pp|pp) : LA=LB=LC=LD=1, m=0, n=1 1 2 1 2 1 2 2 1 2 2 1 2 2 1 2 1 1 1 ( ) ( ) ( ) 1 1 1 ( ) ( ) ( ) pqrs pq rs p q rs mn R R mn R R mn R R pqrs pq rs p q rs mnM R R mnM R R mnM R R H H H h h h - + - - + - = - = - 3 A B C D AB CD ABCD ABCD 4 3 4 3| C { } { }S S H N N Nl µ n x lµ nx lµnxf f f fé ù =ë û å (accompanying coordinate expansion: ACE) ACE-RR • (recurrence relation: RR) • ACE • ACE : 2018.8.7
  • 39. ACE-RR (Pople-Hehre, Dupuis-Rys-King) Algorithm Time [s] Time [s] Time [s] Time [s] ACE-b3k3-RR (present) 134.4 (1.00) 914.6 (1.00) 409.4 (1.00) 697.2 (1.00) Pople and Hehre 154.6 (1.15) 1161.7 (1.27) 437.9 (1.07) 776.7 (1.11) Dupuis, Rys, and King 513.0 (3.82) 6447.9 (7.05) 617.1 (1.51) 1532.7 (2.20) STO-3G STO-6G 3-21G 6-31G MK, M. Kobayashi, H. Nakai and S.Nagase, J. Theor. Comput. Chem., 2005, 4, 139. (C47H51NO14) Hartree-Fock 2018.8.7
  • 40. 2018.8.7 1 1 1 1 1 2 3 4 : 1/4 • ( 1/4 ) • =
  • 41. Fock MPI Icnt = 0 ! m, n, r MPI do m=1, nao do n=1, m do r=1, m icnt++ if (mod(icnt, nproc)!=myrank) cycle ! do s=1, smax Evaluation of AO integrals (mn|rs) Update Fock matrix blocks Fmn, Frs, Fmr, Fms, Fnr, Fns using (mn|rs) and D matrix blocks Drs, Dmn, Dns, Dnr, Dms, Dmr enddo enddo enddo enddo call mpi_allreduce(F) ! Network communication: O(N2) Fµν = Hµν + Dλσ µν λσ( )− 1 2 µσ λν( )⎡⎣ ⎤⎦ λσ ∑ Fock Fock • : CPU ( ) ⇒ 2018.8.7 core core core core CPU
  • 42. MPI/OpenMP • – • OpenMP • MPI – : OpenMP MPI • – 2018.8.7
  • 43. MPI/OpenMP 2018.8.7 MPI OpenMP MPI core core core core CPU core core core core CPU core core core core CPU core core core core CPU core core core core CPU core core core core CPU MPI : MPI/OpenMP :
  • 44. Fock MPI/OpenMP !$OMP parallel do schedule(dynamic,1) reduction(+:F) ! OpenMP : OpenMP do m=nao, 1, -1 do n=1, m ! MPI : OpenMP rstart=mod(mn+mpi_rank, nproc)+1 do r=rstart, m ,nproc do s=1, r Evaluation of AO integrals (mn|rs) Update Fock matrix blocks Fmn, Frs, Fmr, Fms, Fnr, Fns using (mn|rs) and D matrix blocks Drs, Dmn, Dns, Dnr, Dms, Dmr enddo enddo enddo enddo call mpi_allreduce(F) ! Network communication: O(N2) K. Ishimura et al., J. Chem. Theory Comp. 2010, 6, 1075. Fµν = Hµν + Dλσ µν λσ( )− 1 2 µσ λν( )⎡⎣ ⎤⎦ λσ ∑Fock 2018.8.7 • • CPU core core core core CPU
  • 45. Hartree-Fock 2018.8.7 C60@C60H28 RHF/cc-pVDZ (1820 ) MPI/OpenMP MPI/OpenMP (16384 ) Fock : 16.1 [ ] Fock : 10.8 [ ] (OpenMP ) 0 4096 8192 12288 16384 0 4096 8192 12288 16384 CPU MPI/OpenMP Total MPI/OpenMP SCF Fock Flat MPI SCF Total Flat MPI SCF Fock Ideal T. Nakajima, MK, M. Kamiya and Y. Nakatsuka, Int. J. Quantum Chem., 2015, 115, 349–359.
  • 47. Møller-Plesset (MP2) • Møller-Plesset : Hartree-Fock HF – 0 – • Møller-Plesset : 2 2018.8.7 E(2) = − ia jb( ) 2 ia jb( )− ib ja( )⎡ ⎣ ⎤ ⎦ εa + εb − εi − ε jab vir ∑ ij occ ∑ ia jb( )= Cσ b Cνa Cλ j µν λσ( )Cµi µ ∑ λ ∑ ν ∑ σ ∑ E(0) + E(1) = Ψ(0) H0 Ψ(0) + Ψ(0) V Ψ(0) = EHF E(2) = Ψ(0) V Ψ(1) E(3) = Ψ(0) V Ψ(2) Ψ = Ψ(0) + Ψ(1) + Ψ(2) + Ψ(3) + E = E(0) + E(1) + E(2) + E(3) + E ≈ EHF + E(2) H = H0 + λV
  • 48. Møller-Plesset (MP2) • • HF DFT ( ) • Size-consistency Size-extensivity • (HF ) – HF • • HF DFT – MP2 O(N5) VS HF DFT O(N3) • O(N3 -O(N4 2018.8.7
  • 49. MP2 E(2) = ia jb( ) 2 ia jb( )− ib ja( )" # $ % εi +εi −εa −εbijab ∑ ia jb( )= Cσb Cνa Cλ j Cµi µν λσ( ) µ ∑ λ ∑ ν ∑ σ ∑ µν λσ( )= φµ (r1)φν (r1) 1 r1 − r2 φλ (r2 )φσ (r2 )dr1 dr2∫ 2018.8.7 Hartree-Fock Cµi εi MP2 (1) AO4 (2) MO : O(N5) O(N4) : O(N4) : O(N4)
  • 50. Resolution-of-idendity MP2 (RI-MP2) 2018.8.7 • MP2 4 Resolution-of-identity (RI) • MP2 ( ) 5-10 ( O(N5) ) • : O(N3) ia jb( )= Cσ b Cνa Cλ j µν λσ( )Cµi µ ∑ λ ∑ ν ∑ σ ∑ ia jb( )= Bn ia Bn jb n ∑ , Bn ia = l n( ) −1/2 Cνa Cµi µν l( )µ ∑ ν ∑ l ∑ O(N5) RI
  • 51. E(2) = ia jb( ) 2 ia jb( )− ib ja( )" # $ % εi +εj −εa −εbijab ∑MP2 (2) MO Bn ia = l n( ) −1/2 Cνa Cµi µν l( ) µ ∑ ν ∑ l ∑ (1) AO3 µν l( )= χµ (r1)χµ (r1)r12 −1 ξl (r2 )dr1 dr2∫∫ Hartree-Fock Cµi εi (3) MO ia jb( )= Bn ia Bn jb n ∑ : O(N5) O(N3) - - : O(N4) : O(N4) : O(N3) 2018.8.7
  • 52. RI-MP2 2018.8.7 IoccBgn = Myrank*Nchank IoccEnd = (Myrank+1)*Nchank Do Iocc = IoccBgn, IoccEnd ↑MPI parallelization <RI-MP2 calculations> End Do IvirBgn = Myrank*Nchank IvirEnd = (Myrank+1)*Nchank Do Ivir = IvirBgn, IvirEnd ↑MPI parallelization <RI-MP2 calculations> End Do (ia | jb) = Bn ia Bn jb n ∑ : MPI : MPI (ia | jb) = Bn ia Bn jb n ∑ RI-MP2 MPI/OpenMP MK and T. Nakajima, J. Chem. Comput. Theor., 9, 5373 (2013).
  • 53. RI-MP2 2018.8.7 0 1 2 3 4 N-1 : MPI MPI ( =1 ) Evaluation of 3c-2e ERIs Bn ia = l n( ) −1/2 Cνa Cµi µν l( )µ ∑ ν ∑ λ ∑ Loop bProc Sending jb nB to Myrank- bProc process Receiving jb nB from Myrank+ bProc process Loop a Myrank (MPI parallel) Evaluation of 4c-2e MO ERIs (ia | jb) = Bn ia Bn jb n ∑ by BLAS’s DGEMM (OpenMP parallel) Evaluation of MP2 correlation energy E(2) (OpenMP parallel) End Loop a End Loop bProc MPI ( ) MK et al. J. Chem. Theory Comput., 2013, 9, 5373.
  • 54. Tsubame RI-MP2 2018.8.7 MPI Tsubame MPI Tsubame 0/0 1/0 2/0 3/0 4/0 N-1/0 : 0/1 1/1 2/1 3/1 4/1 N-1/1 : 0/2 1/2 2/2 3/2 4/2 N-1/2 : 0/3 1/3 2/3 3/3 4/3 N-1/3 : 0 1 2 3 4 N-1 : MPI MPI ( 1 ) MPI ( ) MK et al. J. Chem. Theory Comput., 2013, 9, 5373. MPI ( ) MK et al. J. Comput. Chem., 2016, 37, 2623. MPI ( 2 )
  • 56. RI-MP2 CUDA CPU-GPU 2018.8.7 Loop bProc (MPI parallel) Sending jb nB to Myrank- bProc + 1 process Sending jb nB from CPU to GPU Receiving jb nB from Myrank+ bProc + 1 process Sending jb nB from CPU to GPU Loop a Myrank (MPI parallel) Receiving 4c-2e integral (ia | jb)P from GPU to CPU Allreduce 4c-2e integral (ia | jb) = (ia | jb)P P ∑ Evaluation of MP2 correlation energy E(2) (OpenMP parallel) End Loop a End Loop bProc • GPU CUDA • - CuBLAS GPU Evaluation of 4c-2e integral (ia | jb)P = Bn ia Bn jb n∈Myrank ∑ (CuBLASDGEMM) : CPU : GPU • CPU-GPU PCI • pinned memory PCI
  • 58. 0 200 400 600 800 1,000 1,200 1,400 1,600 1,800 P2P ring [] Others Step 3: EMP2 corr. eval. Step 3: 4c MO 2-ERI eval. Step 3: 3/3k 2c 2-ERI comm. Step 2: 3/3 2c 2-ERI tran. Step 2: 2/3 3c 2-ERI comm. Step 2: 2c 2-ERI comm. Step 2: 2c 2-ERI eval. Step 1: 2/3 3c 2-ERI comm. Step 1: 2/3 3c 2-ERI tran. Step 1: 1/3 3c 2-ERI tran. Step 1: 3c 2-ERI 2018.8.7 2 (C96H24)2 RI-MP2/cc-pVTZ (6432 , 600 , 5832 ) 2048
  • 59. 2 MPI & MPI/OpenMP RI-MP2 2018.8.7 CPU [ ] [PFLOPs] [%] 8911 71288 2692 8911 0.7 62 17822 142576 1627 14742 1.2 54 35644 285152 1095 21906 2.0 44 44555 356440 955 25112 2.4 42 53466 427728 881 27209 2.6 37 71288 570304 783 30656 2.9 32 80199 641592 759 31612 3.1 30 2 (C150H30)2 RI-MP2/cc-pVTZ (9840 , 930 , 8910 ) 80,199 3.1 PFLOPs ( 30%) 0 17822 35644 53466 71288 0 17822 35644 53466 71288
  • 60. 0 500 1000 1500 2000 2500 1xCPU 1xCPU and 1xGPU 1xCPU and 2xGPUs 2xCPUs and 4xGPUs Elapsedtime[sec] Others Step 3: EMP2 corr. comm. Step 3: 4c MO 2-ERI eval. Step 3: 3/3k 2c 2-ERI comm. Step 2: 3/3 2c 2-ERI tran. Step 2: 2/3 3c 2-ERI comm. Step 2: 2c 2-ERI Inv. Step 2: 2c 2-ERI eval. Step 1: 2/3 3c 2-ERI comm. Step 1: 2/3 3c 2-ERI tran. Step 1: 1/3 3c 2-ERI tran. Step 1: 3c 2-ERI eval. - GPU 2018.8.7 4 GPU GPU CPU: Intel Xeon E5-2690 v2 (10 core) x 2, GPU: NVIDIA Tesla K40 x 4 (C24H12)2 RI-MP2/cc-pVTZ (888 , 78 , 810 , 2304 ) x2.2 X2.5 X4.8
  • 61. TSUBAME2.5 RI-MP2 CPU VS. CPU/GPU 2018.8.7 0 128 256 384 512 0 128 256 384 512 CPU/GPU CPU Ideal TSUBAME2.5 64-512 , CPU: Intel Xeon 5670 (6 ) x 2, GPU: NVIDIA Tesla K20X (3GPU/ ) CPU: 1MPI & 12 / , GPU: 3 MPI & 4 / C96H24 RI-MP2/cc-pVTZ (3212 , 300 , 2916 , 8496 ) GPU 0 400 800 1200 64 128 256 512 [] CPU/GPU CPU x4.1 x6.4 x6.6 x5.7
  • 62. TSUBAME2.5 RI-MP2 CPU VS. CPU/GPU 2018.8.7 TSUBAME2.5 1349 , CPU: Intel Xeon 5670 (6 ) x 2, GPU: NVIDIA Tesla K20X (3GPU/ ) CPU: 1349 MPI & 12 , GPU: 4047 MPI (3 MPI / ) & 4 (C96H24)2 RI-MP2/cc-pVTZ (6432 , 600 , 5832 , 16992 ) 0 500 1000 1500 2000 2500 3000 CPU CPU/GPU [] Others EMP2 corr. 4c Ints comm. 4c Ints 3/3k 2cints comm 3/3 tran3c2 tran 2/3 tran3c2 comm RIInt2_Inv2c RIInt2c comm RIInt2c calc 2/3 tran3c1 comm 2/3 tran3c1 1/3 tran3c1 3c-RIInt comm 3c-RIInt x4.94c ints - GPU 2047 87.5 TFLOPs 419 514.7 TFLOPs
  • 63. RI-MP2 • RI-MP2 • MP2 • MP2 RI-MP2 • RI-MP2 2018.8.7
  • 64. RI-MP2 • 1997 Weigend * • MP2 4 MO Resolution-of-identity (RI) • MP2 • • Turbomole Q-Chem ORCA * F. Weigend, M. Häser, Theor. Chem. Acc., 97, 331 (1997). 1 1 , 1( ) ( ) ( )l l l cµ n µnc c x» år r r 2018.8.7 µν λσ( )= µν l( ) l m( ) −1 m λσ( )lm ∑ µν λσ( ) (x) = µν l( ) (x) l m( ) −1 m λσ( )lm ∑ + µν l( ) l m( ) −1 m λσ( ) (x) lm ∑ − µν l( ) l m( ) −1 m n( ) (x) n o( ) −1 o λσ( )lmno ∑
  • 65. RI-MP2 RI-MP2 2 γ lm MP2-NS = Γia l Bia n m n( ) −1/2 nia ∑Γia l = 2Tij ab −Tji ab ( )Bjb n n l( ) −1/2 jbn ∑ Pab (2) = 2Tij ac −Tji ac ( )Tij bc ijc ∑ 1 Γµνλσ MP2-S = 1 2 Pµν RHF + Pµν (2) ( )Pλσ RHF − 1 2 1 2 Pµλ RHF + Pµλ (2) ( )Pνσ RHF Bia n = l n( ) −1/2 Cνa Cµi µν l( ) µ ∑ ν ∑ l ∑ ia jb( )= Bia n Bjb n n ∑ 3 Pij (2) = − 2Tik ab −Tki ab ( )kab ∑ Tjk ab Tij ab = (ia | jb) εi + ε j − εa − εb dEMP2 dx = Pµν MP2 Hµν (x) + Wµν MP2 Sµν (x) µν ∑ + 1 2 Γµνλσ MP2-S µν λσ( ) (x) µν ∑ + 2 γ lm MP2-NS l m( ) (x) lm ∑µν ∑ + 4 Γµν l,MP2-NS µν l( ) (x) µν ∑ 4 Lai = 2 Γib l ab l( )bl ∑ − 2 Γ ja l ij l( )+ Aaibc Pbc (2) bc ∑ + Aaijk Pjk (2) jk ∑ jl ∑MP2 MP2 CPHF occ.-occ. vir.-vir. Pai (2) = Zai occ.-vir. Pµν (2) = Ppq (2) Cµp Cνq pq ∑Pµν MP2 = Pµν HF + Pµν (2) Γµν l,MP2-NS = Γia l Cµi Cνa ia ∑ Aaipq Ppq (2) pq ∑ + εa − εi( )Zai = Lai Wij (2) = 2 Γia l ja l( ) al ∑ − 1 2 Pij (2) εi + εj( )+ AijpqPpq (2) pq ∑ occ.-ccc. vir.-vir. Wab (2) = −2 Γia n ib n( ) an ∑ − 1 2 Pab (2) εa + εb( ) Wai (2) = −2 Γ ja n ij n( ) ja ∑ − Pai (2) εi occ.-vir. Apqrs = 4 pq rs( )− ps rq( )− pr sq( ) RI 2018.8.7
  • 66. Evaluate 2p DM Yia n + = 2Tij ab !Tji ab ( )Bjb n jb " End Loop bProc Evaluate 1p DM Pab (2) + = 2Tij ac ! Tji ac ( )Tij bc ij " Evaluate 2p DM !ia l = Yia n n l( ) "1/2 n # Store !ia l to distributed memory Evaluate 2p DM Xln + = !ia l Bia n ia " End Loop a Allreduce E(2) , Pij (2) , Pab (2) , and Xln Evaluate 2p DM ! ln = Xlm m n( ) "1/2 n # Evaluate non-separable part of MP2 gradient dEMP2 dx + = 2 ! lm l m( ) (x) lm ! [Step 1: Evaluate 3c integrals] Evaluate Bia n = l n( ) !1/2 C!a Cµi µ! l( ) µ "! "l " Store Bia n to distributed memory [Step 2: Evaluate 1p & 2p MP2 density Matrix (DM) and part of gradient] Loop a Myrank (MPI parallel) Loop bProc = 1, NProc Sending jb nB to Myrank- bProc process Receiving jb nB from Myrank+ bProc process Evaluate 4c-2e integral (ia | jb) = Bia n Bjb n n ! Evaluate Tij ab = (ia | jb) !i + ! j ! !a ! !b Evaluate MP2 correlation energy E(2) Evaluate 1p MP2 DM Pij (2) + = ! 2Tik ab !Tki ab ( )k ! Tjk ab MPI/OpenMP RI-MP2 (1) • MPI : • 2 : MPI • OpenMP : MPI • - BLAS DGEMM : MPI : OpenMP MPI/OpenMP : 3 2018.8.7 O(N5) BLAS DGEMM
  • 67. [Step 3: Evaluate part of MP2 Lagrangian and gradient] Loop bProc = 1, NProc Sending !ia l to Myrank- bProc process Receiving !ia l from Myrank+ bProc process End Loop bProc Loop L Myrank (MPI Parallel) Evaluate MP2 Lagrangian Laq 3 + = !ia l C"q Cµi µ" l( )µ # " # i # Evaluate MP2 Lagrangian Liq 4 + = !ia l C!q Cµa µ! l( )µ "! "i " Evaluate 1p energy weighted DM Wij (2) [I]+ = 2 !ia l ja l( )i " Evaluate non-separable part of MP2 gradient dEMP2 dx + = 4 !µ! l µ! l( ) (x) µ! " End Loop L Allreaduce Laq 3 , Liq 4 , and , Wij (2) [I] [Step 4: Evaluate part of MP2 Lagrangian] Lai 1,2 = Cµi C! j 2 µ! "#( )$ µ" !#( )% & ' ( )P"# (2) "# * ! * µ * [Step 5: Iteratively Solve CPHF equation] Loop iter Evaluate Gai = Cµa C!i 4 µ! "#( )! µ! "#( )! µ! "#( )" # $ % &&P!" (2) !" '# 'µ ' Solve CPHF equation Gai + !a " !i( )Pai (2) = Lai End Loop iter [Step 6: Evaluate part of MP2 energy weighted DM] Wij (2) [III] = Cµi C! j 2 µ! "#( )! µ! "#( )" # $ % P!" (2) !" &# &µ & [Step 7: Evaluate separable part of MP2 gradient] dEMP2 dx + = Pµ! MP2 Hµ! (x) + Wµ! MP2 Sµ! (x) µ! ! + 1 2 "µ!"# MP2-S µ! "#( ) (x) µ! !µ! ! 2 Fock MPI/OpenMP AO MPI/OpenMP RI-MP2 : (2) 3 2 1 1 : MPI : OpenMP MPI/OpenMP : 2018.8.7
  • 68. RI-MP2 2018.8.7 (PDB ID: 1L2Y) RI-MP2/def2-SVP (304 , 2906 512-4,096 CPU [ ] [%] 512 4096 1024 8192 2048 16384 4096 32768 8192 12288 98304 0 4096 8192 12288 0 4096 8192 12288 Speedups Ideal 1 12,288 6 MK and T. Nakajima, J. Comput. Chem., 2017, 38, 489.
  • 69. RI-MP2 2018.8.7 RI-MP2/def2-SVP (304 , 2906 512-4,096 3,072 RI-MP2 37 27 ( 86 ) : Trp-cage 1L2Y PDB ( ) : MK and T. Nakajima, J. Comput. Chem., 2017, 38, 489.
  • 71. (TDDFT) • • 1 2018.8.7 − 1 2 ∇2 +υne R,r( )+ ρ r,t( ) r − #r d #r∫ +υxc r,t( )+υ r,t( ) % & ' ' ( ) * * ψ r,t( )= i ∂ ∂t ψ r,t( )( ) ( ) ( ) ( ) ( ) 1 , 2 ne xc i i id r u u e é ù¢ ¢- Ñ + + + Y = Yê ú ¢-ë û ò r R r r r r r r r ( )1r r( )1 2, , , NY r r r! ( )1,tr r( )1 2, , , ,N tY r r r! (DFT) (TDDFT) Kohn-Sham Kohn-Sham
  • 72. (LR-TDDFT) • TDDFT 1 LR-TDDFT • ( O(N5-) ) O(N3) • 1000 2018.8.7 A B B* A* ! " ## $ % && X Y ! " # $ % & =ω 1 0 0 −1 ! " # $ % & X Y ! " # $ % & )||()|()|( )||()|()|( )( , , ttstsssttsttssss ttstsssttsstttss tsstss d eeddd jbwiacibjacjbiaB bjwiacijbacbjia A DFTHFbjai DFTHF iaabijbjai +-= +- +-= Casida w:
  • 73. LR-TDDFT 2018.8.7 t AO g+=(A+B)t, g-=(A-B)t (AR-BR)1/2 (AR+BR) (AR-BR)1/2 ZR=w2ZR WL=(A+B)R-wL, WR=(A-B)L-wR t W W 90% : O(N4) SCF MPI/OpenMP MPI/OpenMP OpenMP 2, 3BLAS t AR+BR=t+g+, AR-BR=t+ g - AO Davidson ZR: L=|X-Y> & R =|X+Y> OpenMP Bµν q! " # $ = 2 µν λσ( )−cx µλ νσ( )+cx LR µλ νσ( ) LR + fµν ,λσ xc! "& # $'tλσ q! " # $ MK and T. Nakajima, J. Comput. Chem., 2017, 38, 489.
  • 78. AO MPI/OpenMP !$OMP parallel do schedule(dynamic,1) reduction(+:F) do m=nao, 1, -1 do n=1, m ! MPI parallel rstart=mod(mn+mpi_rank, nproc)+1 do r=rstart, m ,nproc do s=1, r Evaluation of AO integrals (mn|rs) Update Fock-like matrix blocks Bq mn, Bq rs, Bq mr, Bq ms, Bq nr, Bq ns using (mn|rs) and trial vector matrix tq tq rs, tq mn, tq ns, tq nr, tq ms, tq mr enddo enddo enddo enddo call mpi_allreduce(B) ! Network communication: O(N2) OpenMP AO MPI AO Kµσν !σ q" # $ % = 2δσ !σ µν λρ( )−cx µλ νρ( )+cx LR µλ νρ( ) LR" #' $ %(tλσρ !σ q" # $ % s: {a, b} µ, n: q: 2018.8.7
  • 79. Davidson - • - - • 2 3 BLAS • - DGEMM - DGEMM - DGEMV Q = q1 !qjbgn( ) R = ajbgn+1 !an( ) S = QT R R = R −QS Loop j = jbgn, n T = rjbgn !rj−1( ) u = TT rj rj = rj −Tu rj = rj / rj End Loop j Loop j = 1, n qj = aj Loop i = 1, j-1 qj = qj − qj •ai( )qi End Loop i qj = qj / qj End Loop j jbgn n 2018.8.7
  • 80. LR-TDDFT (PDB ID: 1CRN) TDDFT B3LYP/def-2SVP, =20 (642 , 6177 , 1260 , 4917 ) 7680 2353 218 2018.8.7 CPU [ ] 768 6144 1536 12288 2304 18432 3072 245760 1536 3072 4608 6144 7680 0 1536 3072 4608 6144 7680
  • 82. 2018.8.7 0 5 0 1 00 1 50 2 00 0 1 2 3 4 5 6 7 8 9 1 0 N O(N3) (Hartree-Fock, DFT) • •
  • 83. 0 5 0 1 00 1 50 2 00 0 1 2 3 4 5 6 7 8 9 1 0 N O(N3) O(N) (Hartree-Fock, DFT) O(N) 2018.8.7
  • 84. • – RSDFT ( ) – ProteinDF, NTChem, SMASH ( ) • – purification • – – – 2018.8.7
  • 85. • Rene Descartes 1596 – 1650 – – – • 1 – • (Discours de la méthode) 2 – (Le second était de diviser chacune des difficultés que j‘examinais en autant de parcelles qu’il se pourrait et qu‘il serait requis pour mieux les résoudre 2018.8.7
  • 87. • • • – GAMESS, ABINIT-MP, OpenFMO, PAICS • – GAMESS • – GAMESS • 2018.8.7
  • 88. (FMO) ( )I IJ I J I I J E E E E E > = + - -å å EI EIJ 2018.8.7
  • 89. : : 1 n1 n E1 En E = EI I ∑ + EIJ − EI − EJ( ) I>J ∑ 2018.8.7
  • 90. FMO-RI-MP2 1 1 † 1 ( | ) ( )ln nm n l m L L- - - = å 1 ( | ) ( | )( | ) ( | ) lm l l m mµn ls µn ls- » å 1 1 1 ,( ) ( ) ( ) m m m cµ n µnc c » år r r Resolution-of-identity (RI) FMO-MP2 ⇒ RI Resolution-of-identity (RI) FMO-RI-MP2 3 ⇒ o: v: n: N : 3 ⇒ ( ) i a j b n n n i a j b B B a a a a a a a a = å 1 ( | )nl i a n i a l B L C l C a a a a µ n µ n µn- = å å å1 ( | )nl i a n a l B L l C a a a n n µn- = å å 2018.8.7 MK Theor. Chem. Acc., 2011, 130, 449–453.
  • 91. FMO RI-MP2 : 968 15,719 2 /1 (484 ) FMO2 RI-MP2/6-31G* 12288 86016 0.82 GAMESS-FMO-MP2 MPI/OpenMP ( ) 2018.8.7 MK, T. Nakajima, and S. Nagase Proceedings of JSST 2012 338-343
  • 92. (Divide-and-Conquer: DC) ⇒ W. Yang, Phys. Rev. Lett., 66, 1438 (1991). 2018.8.7
  • 93. W. Yang • DC-DFT : W. Yang DFT O(N) • DC-semi-empirical MO : W. Yang ( ) • DC-Hartree Fock : W. Yang DC-DFT ( ), • DC-MP2 : , • DC-coupled cluster , • DC-TD-DFT : , • DC-DFTB (density functional tight binding) : ( ), , , 2018.8.7
  • 94. DC-MP2 MP2 µ S(a) MP2 MP2 µ S(a) MP2 ( ) occ( ) vir( ) corr , , ( ) 2i iajb ibja i j a b S E C a j b t t a a a a a a a a a µ µ a µ Î é ù= -ë ûå å å ! ! subsystem corr corrE Ea a » å a MO i ( ) iajb i j a b i a j b t a a a a a a a a a e e e e = + - - ! M. Kobayashi, Y. Imamura, and H. Nakai, J. Chem. Phys., 127. 074103, (2007). 2018.8.7
  • 95. DC-MP2 SCF MP2 nDC-HF SCF HF ⇒ nDC-MP2 SCF HF DC-MP2 n MP2 HF n MP2≫HF C60H62 DC-MP2/6-31G M. Kobayashi and H. Nakai, Int. J. Quantum. Chem., 109, 2227 (2009). 2018.8.7
  • 96. DC-MP2 1 corrEa = corr n Ea = subsystem corr corrE Ea a » å 0 n-10 n-1 2018.8.7 MK, M. Kobayashi, H. Nakai, and S. Nagase, J. Comput. Chem., 2011, 32, 2756.
  • 97. DC-MP2 1 CALL GDDI_SCOPE(DDI_GROUP): CALL GDDICOUNT(-1, MYJOB): EMP2TOT← 0 Loop isub=1, nsub: GDDI CALL GDDICOUNT(0, MYJOB): GDDI If (MYJOB = TRUE) Then MP2 EMP2SUB EMP2TOT← EMP2TOT + EMP2SUB MP2 EMP2TOT End If End Loop CALL GDDICOUNT(1, MYJOB): CALL GDDI_SCOPE(DDI_MASTERS): CALL DDI_GSUMF(EMP2TOT ): MP2 EMP2TOT CALL GDDI_SCOPE(DDI_WORLD): MK, M. Kobayashi, H. Nakai, and S. Nagase, J. Comput. Chem., 2011, 32, 2756. 2018.8.7
  • 98. DC-MP2 64 : T2K-Tsukuba, : 6-31G** b- 20 2018.8.7
  • 99. Density functional tight-binding (DFTB ) • DFT • • 4 • ( O(N3)) DC-DFTB : DC 2018.8.7
  • 100. 1 : • : – , , – : (2015 2 ) – ISBN: 978-4-13-063455-7 – • HPC 1 – Jaewoon Jung – : (2017 4 ) – ISBN: 978-4-87259-586-4 – HPC • HPC 2 – – : (2017 3 ) – ISBN: 978-4-87259-587-1 – HPC 2018.8.7
  • 101. 2 : 3 : • ― ( ) – – : (2002 7 ) – ISBN: 978-4062573757 – • 17 – , – : (2002 2 ) – ISBN: 978-4000110471 – • (KS ) – – : (2002 2 ) – ISBN: 978-4061543881 – Q&A 5 : • !: 100 – – : (2014 2 ) – ISBN: 978-4098251988 – 2018.8.7