SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
Lenstool-HPC
From scratch to supercomputers: building a large-scale
strong lensing computational software bottom-up
HPC Advisory Council, April 2018
Christoph Schäfer and Markus Rexroth (LASTRO)
Gilles Fourestey (SCITAS)
Gravitational lensing Einstein ring (credit: Nasa/Hubble)
Gravitational lensing Einstein ring (credit: Nasa/Hubble)
Light refraction caused by a distribution
of matter according to Albert Einstein's general
theory of relativity (1916)
- Article about star GR in 1936
(A. Einstein, Science)
- Fritz Zwicky posited in 1937 that the
effect could allow galaxy clusters to act
like lenses
- First observed in 1979 "Twin QSO" SBS
0957+561
Gravitational lensing
Twin QSO (center), Credit: ESA/Hubble & NASA
Gravitational lensing
Optical artefacts created by dense
mass distributions
- Galaxies
- Dark matter
- Black holes
Parametric Lens Model
- the ellipticity of the projected mass distribution
- ω the finite core radius
- 0
the normalized surface mass density
- (x,y) the lens position...
“Reverse engineer” the lenses:
Recompose far-away objects by computing
the lenses’ mass
Typical search space dimension: >1010
https://briankoberlein.com/2014/08/01/bend-like-newton/
Using the “vanilla” version of lenstool requires months to find the optimal solution!
Lenstool-HPC Motivations
Lenstool-HPC was developed:
- based on Lenstool (Pr. Kneib et al., from 1996 onwards),
- In 6 man-month FTE,
- By two field scientists and one application expert,
- bottom-up from scratch.
No separation of concern:
- Field scientists define algorithmic constrains at every step
- Computer scientists provide the most optimized implementation on
specific hardware
- eXtreme(-ish) programming
Performance is scaled bottom-up:
- Focus on algorithms/kernels and data structures
- Performance scaling from core to full machine
Formalism
Source’s position on the source plane: The lens equation:
2D lensing potential:
Example of gradient (SIS):
Strong Lensing Algorithm - Step 0
Ellipticity of the projected
mass distribution
Finite core radius
normalized surface mass density Position
Given a parametric model for all the lens types:
Step 0: Compute all the gradients (~90% of TTS) - DLP
for each pixel of the image
Mapping algorithms to the hardware:
- High performance data structures (SOA)
- Implicit and Explicit (hand-coded) vectorization
Strong Lensing Algorithm - Step 1
unlensing
relensing
Given a parametric model for all the lens types
Step 0: Compute all the gradients
Step 1a: unlensing (linear transformation) - TLP
- lensing the green dots (images) to the Source plane (yellow dot)
- Compute the barycenter of the yellow dots
Step 1b: relensing (non-linear transformation) - TLP
- Decompose the Image plane into triangles
- Lense the triangles to the Source plane
- If the lensed triangle includes the barycenter, a predicted image
is found (red triangles in Image plane)
Strong Lensing Algorithm - Step 2 & 3
Given a parametric model for all the lens types
Step 0: Compute all the gradients
Step 1a: unlensing (linear transformation)
- lensing the green dots (images) to the Source plane (yellow dot)
- Compute the barycenter of the yellow dots
Step 1b: relensing (non-linear transformation)
- Decompose the Image plane into triangles
- Lense the triangles to the Source plane
- If the lensed triangle includes the barycenter, a predicted image
is found (red triangles in Image plane)
Step 2: (MPI)
- Compute
Step 3: Pass the Chi2
to a Bayesian MCMC code (MPI)
- Restart with new set of parameters until “close” to reality
Strong Lensing Algorithm
Performancescaling
Gradient Benchmark Results (Step 0)
*AVX2: Broadwell Intel Xeon CPU E5-2630 v4 @ 2.20GHz, intel compilers 17
*AVX512F: Intel Xeon Phi CPU 7210 @ 1.30GHz, intel compilers 17
Gradient benchmark computation: 5000x5000 pixels image, 69 sources, 203 constraints
AVX2* AVX512F*
Code TTS Factor TTS Factor
Lenstool 6.8.1 1.0s 1X 4.8s 1X
LenstoolHPC AOS 0.8s 1.3X 5.6s 0.9X
LenstoolHPC SOA 0.5s 2.0X 3.3s 1.4X
LenstoolHPC SOA + DLP 0.2s 4.5X 0.4s 11.4X
Performance on Broadwell:
- IACA: ~ 6 Flops/cycle
- Intel Advisor: ~25% of peak
Distributed Grid Gradient
Grid Gradient computation distribution
(step 1):
- Images split into regular
subdomains with MPI
- Subdomains are handled using
OpenMP/CUDA
Grid Gradient Benchmark (Step 1)
Single node Grid Gradient benchmark computation: 6000x6000 pixels image, 69 sources, 203 constraints.
- TLP is giving the best bang for your bucks
- SOA alone gives a nice boost (and is mandatory for efficient DLP)
- DLP is getting better with wider vector sizes (avx512 is ~2x avx2).
- V100 is much faster than P100
Grid Gradient benchmark
(TTS, in s)
AVX2 AVX512 SIMT
2630v4
2695v4
(PizDaint)
SKL Plat.
8170 HT
KNL
(greina)
P100 -
(greina) V100
lenstool 6.8.1 (TLP) 11.5 9.3 8.6 10.7 NA NA
lenstool-HPC (SOA + TLP) 5.6 2.1 1.8 5.8 NA NA
lenstool-HPC (SOA + TLP +
DLP/SIMT) 3.0 1.67 0.72 0.84 0.68 0.24
Chi2
computation
The blue dots correspond to the same image
in the source plane
- Each distance for the same source
(in blue) are reduced to Rank 0 using
MPI_Pack
- The Chi2 is computed on Rank 0
The Chi2
is computed by computing the distance between the original images and their computed unlensed/relensed projections
from steps 1a and 1b
Daint-GPU: Chi2 (Step 2) Strong Scaling
Num. nodes Grid Gradient Comp Quadrant unlensing MPI reduction TTS
1 1.39 24.8 0 26.2
2 0.83 12.4 0.00005 13.3
4 0.54 6.21 0.00006 6.79
8 0.41 3.12 0.00011 3.57
16 0.34 1.57 0.00034 1.96
32 0.3 0.81 0.00065 1.14
64 0.28 0.4 0.00133 0.77
128 0.27 0.33 0.00275 0.66
256 0.28 0.17 0.00567 0.56
512 0.3 0.12 0.01251 0.61
Scalability of the Chi2 benchmark using a 8k x 8k image, 69 sources, 203 constraints on Piz Daint multicore, 1 MPI process and 18 threads per socket, in seconds
Daint-MC: Chi2 (Step 2) Strong Scaling
Num. nodes Grid Gradient Comp. Quadrant unlensing MPI reduction TTS
1 10.51 19.25 0.00 29.83
2 5.24 10.11 0.06 15.45
4 2.74 4.87 0.11 7.75
8 1.41 2.51 0.01 3.95
16 0.75 1.41 0.03 2.20
32 0.43 0.72 0.01 1.17
64 0.24 0.37 0.01 0.63
128 0.14 0.20 0.02 0.37
256 0.14 0.12 0.04 0.31
512 0.45 0.09 0.14 0.69
This represents a 50X compared to Lenstool 6.8.1 in 6 months FTE
Scalability of the Chi2 benchmark using a 8k x 8k image, 69 sources, 203 constraints on Piz Daint multicore, 1 MPI process and 18 threads per socket, in seconds
Current Status and Next Steps
Development:
- Code on c4science, with unit tests for each kernels (lensing, unlensing, Chi2…)
- Large development project on CSCS’ Piz Daint
- Aries network tuning
- GPU tuning: lensing, unlensing and chi computation are (very) regular
- Development a parallel MCMC framework, could lead to a 500X speedup, e.g.
- Pi4u: http://www.cse-lab.ethz.ch/research/projects/pi4u/ (P. E. Hadjidoukas et al., ETHZ)
Papers:
High Performance Computing for gravitational lens modeling: single vs double precision on GPUs and CPUs
Markus Rexroth, Christoph Schafer, Gilles Fourestey, Jean-Paul Kneib
To be submitted
High Performance Strong Lensing Map Generation for Lenstool
Christoph Schafer, Gilles Fourestey, Jean-Paul Kneib
In Preparation
Lensing Map Generation
Maps based on second derivative of lensing potentials
(Mass, Amplification, Shear)
● Used for calculation of statistical errors of the MCMC
method
○ Sampling of parameter space
○ Compute average and standard deviation for every pixel
○ Added to best prediction, gives asymmetric error bars
● Fast Map generation crucial
○ Actual process takes months
Grid Gradient 2 benchmark TTS, in s Speedup
lenstool 6.8.1 765
lenstool-HPC 1.3 x567
Single node Grid Gradient benchmark computation: 4200x4200
pixels image, 201 individual lenses.
- Lenstool: Intel(R) Xeon(R) CPU E5-1620 v3 @
3.50GHz
- Lenstool HPC: P100
- Thanks to Pr. Jean-Paul Kneib (LASTRO, EPFL), Pr. Jan Hesthaven and Dr. Vittoria
Rezzonico (SCITAS, EPFL)
- Thanks to Colin McMurtrie and Hussein El-Harake from CSCS for their support using the
CSCS’ test cluster
Brownie points
Questions?
gilles.fourestey@epfl.ch https://scitas.epfl.ch/
christophernstrerne.schaefer@epfl.ch https://lastro.epfl.ch/
markus.rexroth@epfl.ch

Más contenido relacionado

La actualidad más candente

Convolutional Neural Network for pixel-wise skyline detection
Convolutional Neural Network for pixel-wise skyline detectionConvolutional Neural Network for pixel-wise skyline detection
Convolutional Neural Network for pixel-wise skyline detectionDarian Frajberg
 
30th コンピュータビジョン勉強会@関東 DynamicFusion
30th コンピュータビジョン勉強会@関東 DynamicFusion30th コンピュータビジョン勉強会@関東 DynamicFusion
30th コンピュータビジョン勉強会@関東 DynamicFusionHiroki Mizuno
 
Introductory Level of SLAM Seminar
Introductory Level of SLAM SeminarIntroductory Level of SLAM Seminar
Introductory Level of SLAM SeminarDong-Won Shin
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DLLeapMind Inc
 
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 4)
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 4)SIGGRAPH 2014 Course on Computational Cameras and Displays (part 4)
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 4)Matthew O'Toole
 
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...Edge AI and Vision Alliance
 
Tracking Robustness and Green View Index Estimation of Augmented and Diminish...
Tracking Robustness and Green View Index Estimation of Augmented and Diminish...Tracking Robustness and Green View Index Estimation of Augmented and Diminish...
Tracking Robustness and Green View Index Estimation of Augmented and Diminish...Tomohiro Fukuda
 
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...Ken Sakurada
 
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013Sunando Sengupta
 
Analysis of KinectFusion
Analysis of KinectFusionAnalysis of KinectFusion
Analysis of KinectFusionDong-Won Shin
 
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision GroupDTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision GroupLihang Li
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 
Build Your Own 3D Scanner: Surface Reconstruction
Build Your Own 3D Scanner: Surface ReconstructionBuild Your Own 3D Scanner: Surface Reconstruction
Build Your Own 3D Scanner: Surface ReconstructionDouglas Lanman
 
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...Kitsukawa Yuki
 
Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...
Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...
Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...Tomohiro Fukuda
 
Large scale 3 d point cloud compression using adaptive radial distance predic...
Large scale 3 d point cloud compression using adaptive radial distance predic...Large scale 3 d point cloud compression using adaptive radial distance predic...
Large scale 3 d point cloud compression using adaptive radial distance predic...ieeepondy
 

La actualidad más candente (18)

Convolutional Neural Network for pixel-wise skyline detection
Convolutional Neural Network for pixel-wise skyline detectionConvolutional Neural Network for pixel-wise skyline detection
Convolutional Neural Network for pixel-wise skyline detection
 
30th コンピュータビジョン勉強会@関東 DynamicFusion
30th コンピュータビジョン勉強会@関東 DynamicFusion30th コンピュータビジョン勉強会@関東 DynamicFusion
30th コンピュータビジョン勉強会@関東 DynamicFusion
 
Introductory Level of SLAM Seminar
Introductory Level of SLAM SeminarIntroductory Level of SLAM Seminar
Introductory Level of SLAM Seminar
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DL
 
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 4)
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 4)SIGGRAPH 2014 Course on Computational Cameras and Displays (part 4)
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 4)
 
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
 
Tracking Robustness and Green View Index Estimation of Augmented and Diminish...
Tracking Robustness and Green View Index Estimation of Augmented and Diminish...Tracking Robustness and Green View Index Estimation of Augmented and Diminish...
Tracking Robustness and Green View Index Estimation of Augmented and Diminish...
 
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
 
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
 
Analysis of KinectFusion
Analysis of KinectFusionAnalysis of KinectFusion
Analysis of KinectFusion
 
998-isvc16
998-isvc16998-isvc16
998-isvc16
 
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision GroupDTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Build Your Own 3D Scanner: Surface Reconstruction
Build Your Own 3D Scanner: Surface ReconstructionBuild Your Own 3D Scanner: Surface Reconstruction
Build Your Own 3D Scanner: Surface Reconstruction
 
Kintinuous review
Kintinuous reviewKintinuous review
Kintinuous review
 
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
 
Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...
Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...
Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...
 
Large scale 3 d point cloud compression using adaptive radial distance predic...
Large scale 3 d point cloud compression using adaptive radial distance predic...Large scale 3 d point cloud compression using adaptive radial distance predic...
Large scale 3 d point cloud compression using adaptive radial distance predic...
 

Similar a Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lensing Software

“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsPetteriTeikariPhD
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Slide_N
 
R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networksEntrepreneur / Startup
 
Udacity-Didi Challenge Finalists
Udacity-Didi Challenge FinalistsUdacity-Didi Challenge Finalists
Udacity-Didi Challenge FinalistsDavid Silver
 
CUDA by Example : Constant Memory and Events : Notes
CUDA by Example : Constant Memory and Events : NotesCUDA by Example : Constant Memory and Events : Notes
CUDA by Example : Constant Memory and Events : NotesSubhajit Sahu
 
U_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in DepthU_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in DepthManuel Nieves Sáez
 
Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...
Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...
Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...PetteriTeikariPhD
 
Image Texture Analysis
Image Texture AnalysisImage Texture Analysis
Image Texture Analysislalitxp
 
Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Dongmin Choi
 
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Jumlesha Shaik
 
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...KAIST
 
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...KAIST
 
74 real time-image-processing-applied-to-traffic-queue-d
74 real time-image-processing-applied-to-traffic-queue-d74 real time-image-processing-applied-to-traffic-queue-d
74 real time-image-processing-applied-to-traffic-queue-dravi247272
 
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Fisnik Kraja
 
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)Abdulrahman Kerim
 

Similar a Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lensing Software (20)

“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
HS Demo
HS DemoHS Demo
HS Demo
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3
 
R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networks
 
Udacity-Didi Challenge Finalists
Udacity-Didi Challenge FinalistsUdacity-Didi Challenge Finalists
Udacity-Didi Challenge Finalists
 
CUDA by Example : Constant Memory and Events : Notes
CUDA by Example : Constant Memory and Events : NotesCUDA by Example : Constant Memory and Events : Notes
CUDA by Example : Constant Memory and Events : Notes
 
U_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in DepthU_N.o.1T: A U-Net exploration, in Depth
U_N.o.1T: A U-Net exploration, in Depth
 
Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...
Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...
Next Gen Computational Ophthalmic Imaging for Neurodegenerative Diseases and ...
 
Image Texture Analysis
Image Texture AnalysisImage Texture Analysis
Image Texture Analysis
 
Digital.cc
Digital.ccDigital.cc
Digital.cc
 
Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]
 
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
 
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
 
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for ...
 
74 real time-image-processing-applied-to-traffic-queue-d
74 real time-image-processing-applied-to-traffic-queue-d74 real time-image-processing-applied-to-traffic-queue-d
74 real time-image-processing-applied-to-traffic-queue-d
 
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
 
BDL_project_report
BDL_project_reportBDL_project_report
BDL_project_report
 
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
 
Cgm Lab Manual
Cgm Lab ManualCgm Lab Manual
Cgm Lab Manual
 

Más de inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

Más de inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Último

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lensing Software

  • 1. Lenstool-HPC From scratch to supercomputers: building a large-scale strong lensing computational software bottom-up HPC Advisory Council, April 2018 Christoph Schäfer and Markus Rexroth (LASTRO) Gilles Fourestey (SCITAS)
  • 2. Gravitational lensing Einstein ring (credit: Nasa/Hubble)
  • 3. Gravitational lensing Einstein ring (credit: Nasa/Hubble)
  • 4. Light refraction caused by a distribution of matter according to Albert Einstein's general theory of relativity (1916) - Article about star GR in 1936 (A. Einstein, Science) - Fritz Zwicky posited in 1937 that the effect could allow galaxy clusters to act like lenses - First observed in 1979 "Twin QSO" SBS 0957+561 Gravitational lensing Twin QSO (center), Credit: ESA/Hubble & NASA
  • 5. Gravitational lensing Optical artefacts created by dense mass distributions - Galaxies - Dark matter - Black holes Parametric Lens Model - the ellipticity of the projected mass distribution - ω the finite core radius - 0 the normalized surface mass density - (x,y) the lens position... “Reverse engineer” the lenses: Recompose far-away objects by computing the lenses’ mass Typical search space dimension: >1010 https://briankoberlein.com/2014/08/01/bend-like-newton/ Using the “vanilla” version of lenstool requires months to find the optimal solution!
  • 6. Lenstool-HPC Motivations Lenstool-HPC was developed: - based on Lenstool (Pr. Kneib et al., from 1996 onwards), - In 6 man-month FTE, - By two field scientists and one application expert, - bottom-up from scratch. No separation of concern: - Field scientists define algorithmic constrains at every step - Computer scientists provide the most optimized implementation on specific hardware - eXtreme(-ish) programming Performance is scaled bottom-up: - Focus on algorithms/kernels and data structures - Performance scaling from core to full machine
  • 7. Formalism Source’s position on the source plane: The lens equation: 2D lensing potential: Example of gradient (SIS):
  • 8. Strong Lensing Algorithm - Step 0 Ellipticity of the projected mass distribution Finite core radius normalized surface mass density Position Given a parametric model for all the lens types: Step 0: Compute all the gradients (~90% of TTS) - DLP for each pixel of the image Mapping algorithms to the hardware: - High performance data structures (SOA) - Implicit and Explicit (hand-coded) vectorization
  • 9. Strong Lensing Algorithm - Step 1 unlensing relensing Given a parametric model for all the lens types Step 0: Compute all the gradients Step 1a: unlensing (linear transformation) - TLP - lensing the green dots (images) to the Source plane (yellow dot) - Compute the barycenter of the yellow dots Step 1b: relensing (non-linear transformation) - TLP - Decompose the Image plane into triangles - Lense the triangles to the Source plane - If the lensed triangle includes the barycenter, a predicted image is found (red triangles in Image plane)
  • 10. Strong Lensing Algorithm - Step 2 & 3 Given a parametric model for all the lens types Step 0: Compute all the gradients Step 1a: unlensing (linear transformation) - lensing the green dots (images) to the Source plane (yellow dot) - Compute the barycenter of the yellow dots Step 1b: relensing (non-linear transformation) - Decompose the Image plane into triangles - Lense the triangles to the Source plane - If the lensed triangle includes the barycenter, a predicted image is found (red triangles in Image plane) Step 2: (MPI) - Compute Step 3: Pass the Chi2 to a Bayesian MCMC code (MPI) - Restart with new set of parameters until “close” to reality
  • 12. Gradient Benchmark Results (Step 0) *AVX2: Broadwell Intel Xeon CPU E5-2630 v4 @ 2.20GHz, intel compilers 17 *AVX512F: Intel Xeon Phi CPU 7210 @ 1.30GHz, intel compilers 17 Gradient benchmark computation: 5000x5000 pixels image, 69 sources, 203 constraints AVX2* AVX512F* Code TTS Factor TTS Factor Lenstool 6.8.1 1.0s 1X 4.8s 1X LenstoolHPC AOS 0.8s 1.3X 5.6s 0.9X LenstoolHPC SOA 0.5s 2.0X 3.3s 1.4X LenstoolHPC SOA + DLP 0.2s 4.5X 0.4s 11.4X Performance on Broadwell: - IACA: ~ 6 Flops/cycle - Intel Advisor: ~25% of peak
  • 13. Distributed Grid Gradient Grid Gradient computation distribution (step 1): - Images split into regular subdomains with MPI - Subdomains are handled using OpenMP/CUDA
  • 14. Grid Gradient Benchmark (Step 1) Single node Grid Gradient benchmark computation: 6000x6000 pixels image, 69 sources, 203 constraints. - TLP is giving the best bang for your bucks - SOA alone gives a nice boost (and is mandatory for efficient DLP) - DLP is getting better with wider vector sizes (avx512 is ~2x avx2). - V100 is much faster than P100 Grid Gradient benchmark (TTS, in s) AVX2 AVX512 SIMT 2630v4 2695v4 (PizDaint) SKL Plat. 8170 HT KNL (greina) P100 - (greina) V100 lenstool 6.8.1 (TLP) 11.5 9.3 8.6 10.7 NA NA lenstool-HPC (SOA + TLP) 5.6 2.1 1.8 5.8 NA NA lenstool-HPC (SOA + TLP + DLP/SIMT) 3.0 1.67 0.72 0.84 0.68 0.24
  • 15. Chi2 computation The blue dots correspond to the same image in the source plane - Each distance for the same source (in blue) are reduced to Rank 0 using MPI_Pack - The Chi2 is computed on Rank 0 The Chi2 is computed by computing the distance between the original images and their computed unlensed/relensed projections from steps 1a and 1b
  • 16. Daint-GPU: Chi2 (Step 2) Strong Scaling Num. nodes Grid Gradient Comp Quadrant unlensing MPI reduction TTS 1 1.39 24.8 0 26.2 2 0.83 12.4 0.00005 13.3 4 0.54 6.21 0.00006 6.79 8 0.41 3.12 0.00011 3.57 16 0.34 1.57 0.00034 1.96 32 0.3 0.81 0.00065 1.14 64 0.28 0.4 0.00133 0.77 128 0.27 0.33 0.00275 0.66 256 0.28 0.17 0.00567 0.56 512 0.3 0.12 0.01251 0.61 Scalability of the Chi2 benchmark using a 8k x 8k image, 69 sources, 203 constraints on Piz Daint multicore, 1 MPI process and 18 threads per socket, in seconds
  • 17. Daint-MC: Chi2 (Step 2) Strong Scaling Num. nodes Grid Gradient Comp. Quadrant unlensing MPI reduction TTS 1 10.51 19.25 0.00 29.83 2 5.24 10.11 0.06 15.45 4 2.74 4.87 0.11 7.75 8 1.41 2.51 0.01 3.95 16 0.75 1.41 0.03 2.20 32 0.43 0.72 0.01 1.17 64 0.24 0.37 0.01 0.63 128 0.14 0.20 0.02 0.37 256 0.14 0.12 0.04 0.31 512 0.45 0.09 0.14 0.69 This represents a 50X compared to Lenstool 6.8.1 in 6 months FTE Scalability of the Chi2 benchmark using a 8k x 8k image, 69 sources, 203 constraints on Piz Daint multicore, 1 MPI process and 18 threads per socket, in seconds
  • 18. Current Status and Next Steps Development: - Code on c4science, with unit tests for each kernels (lensing, unlensing, Chi2…) - Large development project on CSCS’ Piz Daint - Aries network tuning - GPU tuning: lensing, unlensing and chi computation are (very) regular - Development a parallel MCMC framework, could lead to a 500X speedup, e.g. - Pi4u: http://www.cse-lab.ethz.ch/research/projects/pi4u/ (P. E. Hadjidoukas et al., ETHZ) Papers: High Performance Computing for gravitational lens modeling: single vs double precision on GPUs and CPUs Markus Rexroth, Christoph Schafer, Gilles Fourestey, Jean-Paul Kneib To be submitted High Performance Strong Lensing Map Generation for Lenstool Christoph Schafer, Gilles Fourestey, Jean-Paul Kneib In Preparation
  • 19. Lensing Map Generation Maps based on second derivative of lensing potentials (Mass, Amplification, Shear) ● Used for calculation of statistical errors of the MCMC method ○ Sampling of parameter space ○ Compute average and standard deviation for every pixel ○ Added to best prediction, gives asymmetric error bars ● Fast Map generation crucial ○ Actual process takes months Grid Gradient 2 benchmark TTS, in s Speedup lenstool 6.8.1 765 lenstool-HPC 1.3 x567 Single node Grid Gradient benchmark computation: 4200x4200 pixels image, 201 individual lenses. - Lenstool: Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz - Lenstool HPC: P100
  • 20. - Thanks to Pr. Jean-Paul Kneib (LASTRO, EPFL), Pr. Jan Hesthaven and Dr. Vittoria Rezzonico (SCITAS, EPFL) - Thanks to Colin McMurtrie and Hussein El-Harake from CSCS for their support using the CSCS’ test cluster Brownie points