SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
Nautilus,
IceCube and LIGO
by Igor Sfiligoi, UCSD/SDSC
isfiligoi@sdsc.edu
The Global Research Platform Workshop, Sept 2019 1
Nautilus
• This being GRP, I am assuming you all know what Nautilus is!
• Just for offline reading, I mostly refer to the distributed Kubernetes cluster
• And all the other great supporting services around it
https://nautilus.optiputer.net
The Global Research Platform Workshop, Sept 2019 2
IceCube
• IceCube is a Neutrino experiment
at the South Pole
• Using natural ice as a detector media
• Premier compute application is simulation workload
running photon propagation on GPUs
• Essential for proper calibration of the detector
Direct Photon Propagation
https://icecube.wisc.edu
The Global Research Platform Workshop, Sept 2019 3
LIGO
• The Laser Interferometer Gravitational-Wave Observatory (LIGO)
is mainly designed to detect gravitational-waves
• Through the use of laser interferometry
• Several detectors built to aid with detection confirmation
• LIGO has a high noise to signal ratio
• Requiring significant compute power to filter it out
• Some parameter fitting workloads (e.g. RIFT)
great match for GPUs
https://www.ligo.caltech.edu
The Global Research Platform Workshop, Sept 2019 4
Using OSG portal into Nautilus
• IceCube and LIGO both see Nautilus just as another Open Science Grid (OSG) resource
• Nothing Nautilus/PRP/TNRP specific in their setup
• Delegated trust
• Nautilus trusts “the OSG personnel” and gives resources to “OSG”
• OSG then deals with authenticating, authorizing and allocating resources to IceCube and LIGO
• OSG deployed three layers as Kubernetes pods to make this work
• A HTCondor batch system, with all execute nodes being purely opportunistic
• A CVMFS (CSI) driver to allow for efficient and transparent on-demand software distribution
• A OSG portal to the HTCondor batch system, known as a HTCondor-CE
• The rest of the OSG infrastructure then treats Nautilus/PRP as a “normal Grid site”
The Global Research Platform Workshop, Sept 2019 5
Opportunistic use
• IceCube and LIGO run purely opportunistically
• i.e. only if no regular users are requesting those resources
• They demand nor expect no guaranteed time from Nautilus
• Worker nodes (i.e. pods) can, and will be evicted without any advance warning
• OSG infrastructure helps them seamlessly recover
• They thus can recover most of the compute cycles
that would else go to waste
• Minus a small amount wasted due to preemption
• Including unused interactive-focused resources, like the SunCave
The Global Research Platform Workshop, Sept 2019 6
Opportunistic GPU use
• IceCube and LIGO
the largest users
of Nautilus GPUs
• Being flexible
pays off
https://grafana.nautilus.optiputer.net/d/dRG9q0Ymz/
k8s-compute-resources-namespace-gpus
?orgId=1&var-namespace=osggpus
The Global Research Platform Workshop, Sept 2019 7
Opportunistic CPU use
• Opened access to CPUs
more recently
• IceCube in particular
made good use of them
https://grafana.nautilus.optiputer.net/d/KMsJWWPiz/cluster-usage
The Global Research Platform Workshop, Sept 2019 8
Splitting between the two
• Most of the GPUs used by IceCube
• But LIGO is catching up
• For IceCube, PRP accounts
for about 10% of capacity
• On par with DESY, Comet and Bridges
• For LIGO, it is the dominant resource
LIGO
IceCube
PRP
PRP
https://gracc.opensciencegrid.org/dashboard/db/gpu-payload-jobs-summary?
orgId=1&var-Facility=SDSC-PRP
CPUs mostly for IceCube
• Almost all of the CPU cycles went to IceCube
• Only a minor fraction went to LIGO
• PRP accounted for about 10%
of total IceCube CPU capacity
• The largest in the US,
but still small compared to EU contributions
• Note that it was larger than NERSC
IceCube
PRP
https://gracc.opensciencegrid.org/dashboard/db/payload-jobs-summary?orgId=1&var-Facility=SDSC-PRP
The Global Research Platform Workshop, Sept 2019 10
Multi-level containerization
• As most OSG users, IceCube and LIGO run their workloads inside containers
• But OSG is running “generic HTCondor worker node pods” in Nautilus
• How do we launch another container from inside a running container?
• Can we do it without requiring any privileges?
• Turns out, it is quite trivial with recent Linux Kernels (4.15+)
• Singularity can run from inside an (unprivileged) Docker container – like in Nautilus k8s
• Most Nautilus nodes have the updated Kernel
• Note: Singularity recently fixed a bug that would prevent the use of GPU containers
Kubernetes
HTCondor
Docker
User jobs
Singularity
The Global Research Platform Workshop, Sept 2019 11
Data and Networking
• Open Science Grid (OSG) operates a
Content Delivery Network (CDN)
inside Nautilus
• Called StashCache
• Based on xrootd technology
• Data Caches strategically positioned close to
major compute centers
• Clients use GeoIP to pick the closest one
• Greatly reduces latency and origin throughput for
applications making repeat use of large inputs
• IceCube and LIGO jobs inside Nautilus
do not use the OSG StashCache CDN
• Each job has unique input data
• Many LIGO CPU jobs in both OSG and other Grid
infrastructures do make extensive use of the CDN
• But current CVMFS k8s setup does not play well
with LIGO’s authenticated use-case
• One of the reasons for LIGO’s modest use of Nautilus CPUs
Please come see my poster this afternoon, too.
The Global Research Platform Workshop, Sept 2019 12
IceCube and CEPH
• OSG is currently operating a gridFTP server into CEPH for IceCube
• Used for handling input and output data for jobs running on Nautilus/PRP
• Currently intended only as a temporary solution
• Need arisen due to IceCube’s internal storage procurement delays
• Currently using about 65TB – one of the largest users
• Has been performing to our satisfaction
• Even though it is fronted by a single gridFTP server pod at UCSD
The Global Research Platform Workshop, Sept 2019 13
Acknowledgements
• Most of the work going in this presentation has been funded through
NSF grants OAC-1826967 and OAC-1841487.
• I also kindly thank
• The NSF and other funding agencies for their support for IceCube and LIGO.
• The support from Internet2 in hosting and operating several of the nodes comprising the OSG CDN.
• The NSF for all the other grants funding the infrastructure and services needed for Nautilus and the PRP.
• The various other funding agencies that are contributing hardware and effort to Nautilus.
The Global Research Platform Workshop, Sept 2019 14

Más contenido relacionado

La actualidad más candente

Openstack Summit HK - Ceph defacto - eNovance
Openstack Summit HK - Ceph defacto - eNovanceOpenstack Summit HK - Ceph defacto - eNovance
Openstack Summit HK - Ceph defacto - eNovanceeNovance
 
OpenShift on OpenStack: Deploying With Heat
OpenShift on OpenStack: Deploying With HeatOpenShift on OpenStack: Deploying With Heat
OpenShift on OpenStack: Deploying With HeatAlex Baretto
 
The Salmon Algorithm Spawning with Kubernetes
The Salmon Algorithm Spawning with KubernetesThe Salmon Algorithm Spawning with Kubernetes
The Salmon Algorithm Spawning with KubernetesCloudOps2005
 
CWL on kubernetes
CWL on kubernetesCWL on kubernetes
CWL on kubernetesDanLeehr
 
High Availability from the DevOps side - OpenStack Summit Portland
High Availability from the DevOps side - OpenStack Summit PortlandHigh Availability from the DevOps side - OpenStack Summit Portland
High Availability from the DevOps side - OpenStack Summit PortlandeNovance
 
Ceph Performance and Optimization - Ceph Day Frankfurt
Ceph Performance and Optimization - Ceph Day Frankfurt Ceph Performance and Optimization - Ceph Day Frankfurt
Ceph Performance and Optimization - Ceph Day Frankfurt Ceph Community
 
Kubernetes and OpenStack at Scale
Kubernetes and OpenStack at ScaleKubernetes and OpenStack at Scale
Kubernetes and OpenStack at ScaleStephen Gordon
 
OSDC 2018 | Introduction to SaltStack in the Modern Data Center by Mike Place
OSDC 2018 | Introduction to SaltStack in the Modern Data Center by Mike PlaceOSDC 2018 | Introduction to SaltStack in the Modern Data Center by Mike Place
OSDC 2018 | Introduction to SaltStack in the Modern Data Center by Mike PlaceNETWAYS
 
Apache Mesos Overview and Integration
Apache Mesos Overview and IntegrationApache Mesos Overview and Integration
Apache Mesos Overview and IntegrationAlex Baretto
 
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...eNovance
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...NETWAYS
 
OpenShift, Docker, Kubernetes: The next generation of PaaS
OpenShift, Docker, Kubernetes: The next generation of PaaSOpenShift, Docker, Kubernetes: The next generation of PaaS
OpenShift, Docker, Kubernetes: The next generation of PaaSGraham Dumpleton
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1Tim Bell
 
(Open)Stacking Containers
(Open)Stacking Containers(Open)Stacking Containers
(Open)Stacking ContainersKen Thompson
 
[WSO2Con Asia 2018] Deploying Applications in K8S and Docker
[WSO2Con Asia 2018] Deploying Applications in K8S and Docker[WSO2Con Asia 2018] Deploying Applications in K8S and Docker
[WSO2Con Asia 2018] Deploying Applications in K8S and DockerWSO2
 
OpenNebula Conf | Lightning talk: Managing a Scientific Computing Facility wi...
OpenNebula Conf | Lightning talk: Managing a Scientific Computing Facility wi...OpenNebula Conf | Lightning talk: Managing a Scientific Computing Facility wi...
OpenNebula Conf | Lightning talk: Managing a Scientific Computing Facility wi...NETWAYS
 
PuppetConf 2017: Zero to Kubernetes -Scott Coulton, Puppet
PuppetConf 2017: Zero to Kubernetes -Scott Coulton, PuppetPuppetConf 2017: Zero to Kubernetes -Scott Coulton, Puppet
PuppetConf 2017: Zero to Kubernetes -Scott Coulton, PuppetPuppet
 
Kubernetes meetup bangalore december 2017 - v02
Kubernetes meetup bangalore   december 2017 - v02Kubernetes meetup bangalore   december 2017 - v02
Kubernetes meetup bangalore december 2017 - v02Kumar Gaurav
 

La actualidad más candente (20)

Openstack Summit HK - Ceph defacto - eNovance
Openstack Summit HK - Ceph defacto - eNovanceOpenstack Summit HK - Ceph defacto - eNovance
Openstack Summit HK - Ceph defacto - eNovance
 
OpenShift on OpenStack: Deploying With Heat
OpenShift on OpenStack: Deploying With HeatOpenShift on OpenStack: Deploying With Heat
OpenShift on OpenStack: Deploying With Heat
 
The Salmon Algorithm Spawning with Kubernetes
The Salmon Algorithm Spawning with KubernetesThe Salmon Algorithm Spawning with Kubernetes
The Salmon Algorithm Spawning with Kubernetes
 
CWL on kubernetes
CWL on kubernetesCWL on kubernetes
CWL on kubernetes
 
High Availability from the DevOps side - OpenStack Summit Portland
High Availability from the DevOps side - OpenStack Summit PortlandHigh Availability from the DevOps side - OpenStack Summit Portland
High Availability from the DevOps side - OpenStack Summit Portland
 
From Java Monolith to k8s with CI/CD
From Java Monolith to k8s with CI/CD From Java Monolith to k8s with CI/CD
From Java Monolith to k8s with CI/CD
 
Ceph Performance and Optimization - Ceph Day Frankfurt
Ceph Performance and Optimization - Ceph Day Frankfurt Ceph Performance and Optimization - Ceph Day Frankfurt
Ceph Performance and Optimization - Ceph Day Frankfurt
 
Kubernetes and OpenStack at Scale
Kubernetes and OpenStack at ScaleKubernetes and OpenStack at Scale
Kubernetes and OpenStack at Scale
 
OSDC 2018 | Introduction to SaltStack in the Modern Data Center by Mike Place
OSDC 2018 | Introduction to SaltStack in the Modern Data Center by Mike PlaceOSDC 2018 | Introduction to SaltStack in the Modern Data Center by Mike Place
OSDC 2018 | Introduction to SaltStack in the Modern Data Center by Mike Place
 
Apache Mesos Overview and Integration
Apache Mesos Overview and IntegrationApache Mesos Overview and Integration
Apache Mesos Overview and Integration
 
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...
OpenStack in Action 4! Doug hellman - Intersection of OpenStack and python co...
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
 
OpenShift Evolution
OpenShift EvolutionOpenShift Evolution
OpenShift Evolution
 
OpenShift, Docker, Kubernetes: The next generation of PaaS
OpenShift, Docker, Kubernetes: The next generation of PaaSOpenShift, Docker, Kubernetes: The next generation of PaaS
OpenShift, Docker, Kubernetes: The next generation of PaaS
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1
 
(Open)Stacking Containers
(Open)Stacking Containers(Open)Stacking Containers
(Open)Stacking Containers
 
[WSO2Con Asia 2018] Deploying Applications in K8S and Docker
[WSO2Con Asia 2018] Deploying Applications in K8S and Docker[WSO2Con Asia 2018] Deploying Applications in K8S and Docker
[WSO2Con Asia 2018] Deploying Applications in K8S and Docker
 
OpenNebula Conf | Lightning talk: Managing a Scientific Computing Facility wi...
OpenNebula Conf | Lightning talk: Managing a Scientific Computing Facility wi...OpenNebula Conf | Lightning talk: Managing a Scientific Computing Facility wi...
OpenNebula Conf | Lightning talk: Managing a Scientific Computing Facility wi...
 
PuppetConf 2017: Zero to Kubernetes -Scott Coulton, Puppet
PuppetConf 2017: Zero to Kubernetes -Scott Coulton, PuppetPuppetConf 2017: Zero to Kubernetes -Scott Coulton, Puppet
PuppetConf 2017: Zero to Kubernetes -Scott Coulton, Puppet
 
Kubernetes meetup bangalore december 2017 - v02
Kubernetes meetup bangalore   december 2017 - v02Kubernetes meetup bangalore   december 2017 - v02
Kubernetes meetup bangalore december 2017 - v02
 

Similar a GRP 19 - Nautilus, IceCube and LIGO

Bringing OSG users to the PRP Kubernetes Cluster
Bringing OSG users to the PRP Kubernetes ClusterBringing OSG users to the PRP Kubernetes Cluster
Bringing OSG users to the PRP Kubernetes ClusterIgor Sfiligoi
 
Introducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerIntroducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerAkihiro Nomura
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3Tim Bell
 
An overview of the Kubernetes architecture
An overview of the Kubernetes architectureAn overview of the Kubernetes architecture
An overview of the Kubernetes architectureIgor Sfiligoi
 
OGF standards for cloud computing
OGF standards for cloud computingOGF standards for cloud computing
OGF standards for cloud computingAlan Sill
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Larry Smarr
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Larry Smarr
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Larry Smarr
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications OpenEBS
 
CloudLab Overview
CloudLab OverviewCloudLab Overview
CloudLab OverviewEd Dodds
 
Kubernetes - Hosted OSG Services
Kubernetes - Hosted OSG ServicesKubernetes - Hosted OSG Services
Kubernetes - Hosted OSG ServicesIgor Sfiligoi
 
2014 11-05 hpcac-kniep_christian_dockermpi
2014 11-05 hpcac-kniep_christian_dockermpi2014 11-05 hpcac-kniep_christian_dockermpi
2014 11-05 hpcac-kniep_christian_dockermpiQNIB Solutions
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...chiportal
 
OGF Introductory Overview - FAS* 2014
OGF Introductory Overview -  FAS* 2014OGF Introductory Overview -  FAS* 2014
OGF Introductory Overview - FAS* 2014Alan Sill
 
Adventures in Research
Adventures in ResearchAdventures in Research
Adventures in ResearchNETWAYS
 
OpenNebulaConf 2013 -Adventures in Research by Joel Merrick
OpenNebulaConf 2013 -Adventures in Research by Joel Merrick OpenNebulaConf 2013 -Adventures in Research by Joel Merrick
OpenNebulaConf 2013 -Adventures in Research by Joel Merrick OpenNebula Project
 
What's New in Grizzly & Deploying OpenStack with Puppet
What's New in Grizzly & Deploying OpenStack with PuppetWhat's New in Grizzly & Deploying OpenStack with Puppet
What's New in Grizzly & Deploying OpenStack with PuppetMark Voelker
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsIgor Sfiligoi
 
OGF Introductory Overview - OGF 44 at EGI Conference 2015
OGF Introductory Overview - OGF 44 at EGI Conference 2015OGF Introductory Overview - OGF 44 at EGI Conference 2015
OGF Introductory Overview - OGF 44 at EGI Conference 2015Alan Sill
 
2018년 3월 정기 세미나 - March 2018 Ops Meetup 후기
2018년 3월 정기 세미나 - March 2018 Ops Meetup 후기2018년 3월 정기 세미나 - March 2018 Ops Meetup 후기
2018년 3월 정기 세미나 - March 2018 Ops Meetup 후기OpenStack Korea Community
 

Similar a GRP 19 - Nautilus, IceCube and LIGO (20)

Bringing OSG users to the PRP Kubernetes Cluster
Bringing OSG users to the PRP Kubernetes ClusterBringing OSG users to the PRP Kubernetes Cluster
Bringing OSG users to the PRP Kubernetes Cluster
 
Introducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerIntroducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 Supercomputer
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3
 
An overview of the Kubernetes architecture
An overview of the Kubernetes architectureAn overview of the Kubernetes architecture
An overview of the Kubernetes architecture
 
OGF standards for cloud computing
OGF standards for cloud computingOGF standards for cloud computing
OGF standards for cloud computing
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
CloudLab Overview
CloudLab OverviewCloudLab Overview
CloudLab Overview
 
Kubernetes - Hosted OSG Services
Kubernetes - Hosted OSG ServicesKubernetes - Hosted OSG Services
Kubernetes - Hosted OSG Services
 
2014 11-05 hpcac-kniep_christian_dockermpi
2014 11-05 hpcac-kniep_christian_dockermpi2014 11-05 hpcac-kniep_christian_dockermpi
2014 11-05 hpcac-kniep_christian_dockermpi
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
 
OGF Introductory Overview - FAS* 2014
OGF Introductory Overview -  FAS* 2014OGF Introductory Overview -  FAS* 2014
OGF Introductory Overview - FAS* 2014
 
Adventures in Research
Adventures in ResearchAdventures in Research
Adventures in Research
 
OpenNebulaConf 2013 -Adventures in Research by Joel Merrick
OpenNebulaConf 2013 -Adventures in Research by Joel Merrick OpenNebulaConf 2013 -Adventures in Research by Joel Merrick
OpenNebulaConf 2013 -Adventures in Research by Joel Merrick
 
What's New in Grizzly & Deploying OpenStack with Puppet
What's New in Grizzly & Deploying OpenStack with PuppetWhat's New in Grizzly & Deploying OpenStack with Puppet
What's New in Grizzly & Deploying OpenStack with Puppet
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the Clouds
 
OGF Introductory Overview - OGF 44 at EGI Conference 2015
OGF Introductory Overview - OGF 44 at EGI Conference 2015OGF Introductory Overview - OGF 44 at EGI Conference 2015
OGF Introductory Overview - OGF 44 at EGI Conference 2015
 
2018년 3월 정기 세미나 - March 2018 Ops Meetup 후기
2018년 3월 정기 세미나 - March 2018 Ops Meetup 후기2018년 3월 정기 세미나 - March 2018 Ops Meetup 후기
2018년 3월 정기 세미나 - March 2018 Ops Meetup 후기
 

Más de Igor Sfiligoi

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROIgor Sfiligoi
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...Igor Sfiligoi
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Igor Sfiligoi
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingIgor Sfiligoi
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesIgor Sfiligoi
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateIgor Sfiligoi
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeIgor Sfiligoi
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Igor Sfiligoi
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessIgor Sfiligoi
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputIgor Sfiligoi
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsIgor Sfiligoi
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROIgor Sfiligoi
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstIgor Sfiligoi
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyIgor Sfiligoi
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCIgor Sfiligoi
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsIgor Sfiligoi
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsIgor Sfiligoi
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksIgor Sfiligoi
 

Más de Igor Sfiligoi (20)

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance compute
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
 

Último

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxJennifer Lim
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 

Último (20)

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 

GRP 19 - Nautilus, IceCube and LIGO

  • 1. Nautilus, IceCube and LIGO by Igor Sfiligoi, UCSD/SDSC isfiligoi@sdsc.edu The Global Research Platform Workshop, Sept 2019 1
  • 2. Nautilus • This being GRP, I am assuming you all know what Nautilus is! • Just for offline reading, I mostly refer to the distributed Kubernetes cluster • And all the other great supporting services around it https://nautilus.optiputer.net The Global Research Platform Workshop, Sept 2019 2
  • 3. IceCube • IceCube is a Neutrino experiment at the South Pole • Using natural ice as a detector media • Premier compute application is simulation workload running photon propagation on GPUs • Essential for proper calibration of the detector Direct Photon Propagation https://icecube.wisc.edu The Global Research Platform Workshop, Sept 2019 3
  • 4. LIGO • The Laser Interferometer Gravitational-Wave Observatory (LIGO) is mainly designed to detect gravitational-waves • Through the use of laser interferometry • Several detectors built to aid with detection confirmation • LIGO has a high noise to signal ratio • Requiring significant compute power to filter it out • Some parameter fitting workloads (e.g. RIFT) great match for GPUs https://www.ligo.caltech.edu The Global Research Platform Workshop, Sept 2019 4
  • 5. Using OSG portal into Nautilus • IceCube and LIGO both see Nautilus just as another Open Science Grid (OSG) resource • Nothing Nautilus/PRP/TNRP specific in their setup • Delegated trust • Nautilus trusts “the OSG personnel” and gives resources to “OSG” • OSG then deals with authenticating, authorizing and allocating resources to IceCube and LIGO • OSG deployed three layers as Kubernetes pods to make this work • A HTCondor batch system, with all execute nodes being purely opportunistic • A CVMFS (CSI) driver to allow for efficient and transparent on-demand software distribution • A OSG portal to the HTCondor batch system, known as a HTCondor-CE • The rest of the OSG infrastructure then treats Nautilus/PRP as a “normal Grid site” The Global Research Platform Workshop, Sept 2019 5
  • 6. Opportunistic use • IceCube and LIGO run purely opportunistically • i.e. only if no regular users are requesting those resources • They demand nor expect no guaranteed time from Nautilus • Worker nodes (i.e. pods) can, and will be evicted without any advance warning • OSG infrastructure helps them seamlessly recover • They thus can recover most of the compute cycles that would else go to waste • Minus a small amount wasted due to preemption • Including unused interactive-focused resources, like the SunCave The Global Research Platform Workshop, Sept 2019 6
  • 7. Opportunistic GPU use • IceCube and LIGO the largest users of Nautilus GPUs • Being flexible pays off https://grafana.nautilus.optiputer.net/d/dRG9q0Ymz/ k8s-compute-resources-namespace-gpus ?orgId=1&var-namespace=osggpus The Global Research Platform Workshop, Sept 2019 7
  • 8. Opportunistic CPU use • Opened access to CPUs more recently • IceCube in particular made good use of them https://grafana.nautilus.optiputer.net/d/KMsJWWPiz/cluster-usage The Global Research Platform Workshop, Sept 2019 8
  • 9. Splitting between the two • Most of the GPUs used by IceCube • But LIGO is catching up • For IceCube, PRP accounts for about 10% of capacity • On par with DESY, Comet and Bridges • For LIGO, it is the dominant resource LIGO IceCube PRP PRP https://gracc.opensciencegrid.org/dashboard/db/gpu-payload-jobs-summary? orgId=1&var-Facility=SDSC-PRP
  • 10. CPUs mostly for IceCube • Almost all of the CPU cycles went to IceCube • Only a minor fraction went to LIGO • PRP accounted for about 10% of total IceCube CPU capacity • The largest in the US, but still small compared to EU contributions • Note that it was larger than NERSC IceCube PRP https://gracc.opensciencegrid.org/dashboard/db/payload-jobs-summary?orgId=1&var-Facility=SDSC-PRP The Global Research Platform Workshop, Sept 2019 10
  • 11. Multi-level containerization • As most OSG users, IceCube and LIGO run their workloads inside containers • But OSG is running “generic HTCondor worker node pods” in Nautilus • How do we launch another container from inside a running container? • Can we do it without requiring any privileges? • Turns out, it is quite trivial with recent Linux Kernels (4.15+) • Singularity can run from inside an (unprivileged) Docker container – like in Nautilus k8s • Most Nautilus nodes have the updated Kernel • Note: Singularity recently fixed a bug that would prevent the use of GPU containers Kubernetes HTCondor Docker User jobs Singularity The Global Research Platform Workshop, Sept 2019 11
  • 12. Data and Networking • Open Science Grid (OSG) operates a Content Delivery Network (CDN) inside Nautilus • Called StashCache • Based on xrootd technology • Data Caches strategically positioned close to major compute centers • Clients use GeoIP to pick the closest one • Greatly reduces latency and origin throughput for applications making repeat use of large inputs • IceCube and LIGO jobs inside Nautilus do not use the OSG StashCache CDN • Each job has unique input data • Many LIGO CPU jobs in both OSG and other Grid infrastructures do make extensive use of the CDN • But current CVMFS k8s setup does not play well with LIGO’s authenticated use-case • One of the reasons for LIGO’s modest use of Nautilus CPUs Please come see my poster this afternoon, too. The Global Research Platform Workshop, Sept 2019 12
  • 13. IceCube and CEPH • OSG is currently operating a gridFTP server into CEPH for IceCube • Used for handling input and output data for jobs running on Nautilus/PRP • Currently intended only as a temporary solution • Need arisen due to IceCube’s internal storage procurement delays • Currently using about 65TB – one of the largest users • Has been performing to our satisfaction • Even though it is fronted by a single gridFTP server pod at UCSD The Global Research Platform Workshop, Sept 2019 13
  • 14. Acknowledgements • Most of the work going in this presentation has been funded through NSF grants OAC-1826967 and OAC-1841487. • I also kindly thank • The NSF and other funding agencies for their support for IceCube and LIGO. • The support from Internet2 in hosting and operating several of the nodes comprising the OSG CDN. • The NSF for all the other grants funding the infrastructure and services needed for Nautilus and the PRP. • The various other funding agencies that are contributing hardware and effort to Nautilus. The Global Research Platform Workshop, Sept 2019 14