7. Solving molecular puzzles
by computational docking
haddock.science.uu.nl
>10000 users worldwide
Used by major pharma companies
8. Haddock
web portal
• > 10500 registered users
• > 188000 served runs
since June 2008
• > 35% on the GRID
Visit bonvinlab.org/software
De Vries et al. Nature Prot. 2010
Van Zundert et al. J.Mol.Biol. 2016
16. # Number of dimensions 2
# INAME 1 1H
# INAME 2 1H
12 2.137 2.387 1 T 0.000e+00 0.00e+00 - 0 2756 2760 0
14 2.387 4.140 1 T 0.000e+00 0.00e+00 - 0 2760 2752 0
32 1.849 4.432 1 T 0.000e+00 0.00e+00 - 0 2259 2257 0
36 1.849 3.143 1 T 0.000e+00 0.00e+00 - 0 2259 2587 0
39 1.760 4.432 1 T 0.000e+00 0.00e+00 - 0 2260 2257 0
40 1.760 1.849 1 T 0.000e+00 0.00e+00 - 0 2260 2259 0
43 1.760 3.143 1 T 0.000e+00 0.00e+00 - 0 2260 2587 0
46 1.649 4.432 1 T 1.035e+05 0.00e+00 r 0 2583 2257 0
47 1.649 1.849 1 T 0.000e+00 0.00e+00 - 0 2583 2259 0
assign ( resid 501 and name OO )
( resid 501 and name Z )
( resid 501 and name X )
( resid 501 and name Y )
( resid 2 and name CA ) -0.1400 0.15000
assign ( resid 501 and name OO )
( resid 501 and name Z )
( resid 501 and name X )
( resid 501 and name Y )
( resid 3 and name CA ) -0.0100 0.15000
Data
interpretation
Structure, dynamics & interactions
è impact on research and health:
- origin of disease
- design of new experiments
- drug design…
Exploiting GRID resources in structural biology…
Computations
NMR data collection and processing SAXS data analysis
17. eScience hub for NMR and structural biology
Infrastructure
Science
Com
m
unity
Knowledge
The WeNMR VRC
19. WeNMR VRC (February 2018)
• enmr.eu: One of the largest (#users) VO in life sciences
• >830 users have registered so far(36% outside EU)
• Support from >40 sites for >200’000 CPU cores via EGI infrastructure
• User-friendly access to Grid via web portals
• Supported by an SLA (2016, updated in 2017) with EGI and NGIs
www.wenmr.eu
NMR
SAXSA worldwide
e-Infrastructure for NMR and
structural biology
22. Challenges & e-Solutions
§ Attract users!
§ Offer them top of the line eScience solutions for
their research ... which means top of the line
software
23. The WeNMR VRC
Knowledge
Help Center
Tutorials, Wiki
Consultancy
Services
Portals
VRC
Third-party aggregation
Grid
Exposure
Marketplace
Blogs, news,
events..
User
SSO
Facebook
• 39 web portals (31 NMR, 7 SAXS)
• of which 29 by partners
• Uniform access through the new
Single Sign On functionality
• RPC access available for some
portals
25. Challenges & e-Solutions
§ Attract users!
§ Offer them top of the line eScience solutions for
their research ... which means top of the line
software
§ Provide them training, tutorials and support
26. The WeNMR VRC
Knowledge
Help Center
Tutorials, Wiki
Consultancy
Services
Portals
VRC
Third-party aggregation
Grid
Exposure
Marketplace
Blogs, news,
events..
User
SSO
Facebook
• Help center
• Consultancy remote or on
location
• Tutorials, wiki documents,
movies
• YouTube channel
• Many workshops …
27. Challenges & e-Solutions
§ Attract users!
§ Offer them top of the line eScience solutions for
their research ... which means top of the line
softwares)
§ Provide them training, tutorials and support
§ Make their life easier
28. The WeNMR VRC
Knowledge
Help Center
Tutorials, Wiki
Consultancy
Services
Portals
VRC
Third-party aggregation
Grid
Exposure
Marketplace
Blogs, news,
events..
User
SSO
Facebook
33. Job management
§ Need to handle millions of job submission
§ Initially based on gLite
§ Mostly migrated to DIRAC4EGI
§ From a user perspective DIRAC is in principle
grid/cloud agnostic:
§ Can automatically launch VMs
§ Software distributed via CVMFS
35. European Open Science Cloud
CC
Under EGI-Engage
The eInfrastructure landscape over the years
36. § With activities toward:
§ Integrating the communities
§ Making best use of cloud resources
§ Bringing data to the cloud (cryo-EM)
§ Exploiting GPGPU resources
§ While maintaining the quality of our
current services!
The MoBrain CC under EGI Engage
43. Exploring GPGPU resources: PowerFit
• Python package to
automatically fit high-
resolution biomolecular
structures into cryo-EM
densities
• Simple command-line
program, able to run using
single/multiple CPUs or GPU
van Zundert and Bonvin. AIMS Biophysics 2, 73-87 (2015)
www.github.com/haddocking/powerfit
44. Exploring GPGPU resources: DisVis
• Python package to Python
package to visualize and
quantify the accessible
interaction space of distance
restrained binary biomolecular
complexes.
• Simple command-line
program, able to run using
single/multiple CPUs or GPU
van Zundert and Bonvin. Bioinformatics. 31, 3222-3224 (2015)
www.github.com/haddocking/disvis
46. Baremetal vs grid vs cloud
ID Type GPU #Cores CPU type Mem (GB)
B-K20 Baremetal Tesla K20 24 HT (12 real) Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz 32
B-K40 Baremetal Tesla K40 48 HT (24 real) Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz 512
D-K20 Docker on K20 Tesla K20 24 Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz 32
K-K40 KVM on K40 Tesla K40 24 Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz 32
Case Machine
TimeGPU
(sec)
TimeCPU 1
core CPU1/GPU
B-K40 Baremetal 674 7928 11.8
K-K40 KVM 671 7996 11.9
B-K20 Baremetal 830 11839 14.3
D- K20 Docker 837 11926 14.3
No loss of performance
CourtesyofMarioDavid
INDIGO
<= Grid
<= Cloud
47. GPGPU, GRID-enabled web portals
http://milou.science.uu.nl/enmr/services/DISVIS/ http://milou.science.uu.nl/enmr/services/POWERFIT/
48. Pre-processing
+
Input files
packaging
Architecture behind the portals
User DB
User not found
Input error
WEB CLIENT WEB SERVER MASTER NODE WORKING NODE
GPU-
calculation
Validation
Submission
to local
nodes
Submission
to grid
node
CPU-
calculation
Chimera
image
generation
Post-processing
+
Results formatting
Output files
packaging
+
submission of
image generation
OR
50. Some usage statsOperational since Aug. 2016
Published Dec. 2016 Top pulls in INDIGO applications docker hub
https://hub.docker.com/r/indigodatacloudapps/
54. Thematic services under EOSC-Hub
https://www.egi.eu/use-cases/scientific-applications-tools/
55. Thematic services under EOSC-Hub
§ Harvest both
§ DIRAC4EGI can handle both without the
additional burden of managing the cloud
VMs
§ We still have much more grid than cloud
resources
§ HADDOCK portal as use case in Helix Nebula
Science Cloud
56. The exascale challenge
Ø ~20’000 human proteins
Ø Hundreds of thousands of interactions
Ø Billions CPU hours and exabytes of data
Ø Need to make our software ready for it!
bioexcel.eu