SlideShare una empresa de Scribd logo
1 de 10
Stanford/SLAC Cryo-EM
Computing and Storage
Pacific Research Platform Workshop, UCSC SV
Yee-Ting Li, September 2018
● Stanford/SLAC Cryo-EM
○ Joint initiative between Stanford School of Medicine and SLAC Lab.
○ Led by Wah Chiu (formerly of BCM)
○ Data taking started ~Jan 2018
○ 3 Krio’s (2 with GIFs) and 1 Arctica
■ All with Gatan K2 cameras; upgrades to K3 start next month!
■ Users: NIH U24, NIH P41, Stanford and SLAC
● S2C2: Stanford-SLAC Cryo-EM Center
○ New NIH U24 Award
○ Will add 4+ microscopes within the next 3 years; 2 already on order
● Me:
○ Plan, manage and operate all data management, computation, and software infrastructure for all
Cryo-EM Facilities
○ Day Job: Work with all SLAC science to support their computational and storage requirements
(LSST, Fermi, LCLS I/II, CDMS, ATLAS)
2
Introduction
Stanford/SLAC will provide World Class expertise and training for Cryo-EM
3
Architectural Overview
Similar architecture to LCLS and other Data-centric Scientific Experiments
Onsite
Offsite - User Institutions
(Universities, other labs etc)
Onsite SLAC - Petascale
Data Reduction
Pipeline
Online
Monitoring
~2 Gb/s Fast
feedback
storage
Up to 0.267 GB/s
Detector
(TEM)
Offline
storage
Petascale
HPC
Offline
storage
Terascale
HPC
Fast
Feedback
~ seconds ~ 2 min
N/A
~ days
Pre-processing
Reconstruction
2D + 3D + Refinement
x4
● Just on TEM2
○ 35 active proposals
○ >100 experiments
● Mostly limited by
○ Scope downtime
○ Screening time
○ Managerial efficiencies
● Typical Experiment
○ Single Particle: ~6,000 images, total of 4-8TB
○ Tomography: 10-20 tomograms, total of 1-2TB
4
Activity Thus Far...
Every increasing data rates
5
Technological Tenets of Data Management
Provide users quick feedback on sample and data quality
Data Pipelines &
Data Provenance
eLogBook &
Monitoring/Reporting
Near Real Time Feedback &
Data Analysis
6
User Focused
“Is my sample viable”, “Can I get answers from my time on the ‘scope”
● Provide rough gauge of sample, image and data quality in (near) real-time
● Provide automated processing and feedback on
○ sample previews (remote access)
○ Initial pre-processing:
■ Movie alignment (ie MotionCor)
■ CTF calculations (ie ctffind/gctf)
■ Initial automated particle picking (ie dogpicker/gautomatch)
○ Soon:
■ Initial 2D class averages
■ Initial 3D density map
● Provide data management, computational resources and software support for users
○ GPU resources for relion, cryosparc etc.
○ Storage of metadata (logbook), raw data and initial data products
● Opensource framework for ETL
● Rich python-based pluggable architecture
● Integrated into our LSF, TSDB and GPFS
environment
● Full accounting and reporting of pipeline
● Horizontally scalable; deployed as containers
● Easy to use GUI; flexible CLI
● Each pipeline defined as Graphs (DAG)
7
Pipelines: Managed with Apache Airflow
Don’t reinvent the wheel; add mud-flaps
● Live pre-processing preview sent to dedicated SLACK
● eLogbook
○ Provide centralised platform to access, view, annotate and
process their data
○ Provides management reports etc.
8
eLogBook & Monitoring/Reporting
Reporting for both Users and Management
9
Infrastructure/Resource Predictions
Significant ramp up of (GPU) Compute and Storage
Currently 0-5 years * 10 years *
CPU Compute 0.0 PFLOPS 0.0 PFLOPS 0.0 PFLOPS
GPU 1.0 PFLOPS 10.0 PFLOPS 50.0 PFLOPS
Disk Storage 1 PB 5 PB 10 PB
Tape Storage 0 5 PB $ 10 PB $
Racks 1 6 10
Numbers assume single-particle analysis only. Tomography requirements TBD (assumed similar)
* Preliminary
$ Heavily dependent upon agreed NIH data retention policies
● Funding agencies want to know how much time/effort/storage/compute time something takes; as well as
how many papers can be published
● Experimentalists do not want to become experts in hardware and software
● Horizontally scalable hardware and software solutions required
○ No SPOF; burstability; off-site/cloud
● Hot/Cold/Warm data
○ How big is the active data set? How long do they need the data around for? Do duplicates exist?
○ Use of HSM and/or cloud storage
● Data/Experiment portability
○ Not just rsync of data
○ Provenance of workflow and data: reproducibility of results - containers!
● Experimental metadata
○ microscope parameters + sample prep details
○ Metadata and data catalogue
● Down in the weeds:
○ Inodes! Best move to object stores… software support? 10
Remarks

Más contenido relacionado

La actualidad más candente

Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Robert Grossman
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataRobert Grossman
 
Integrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloudIntegrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloudData Finder
 
Optique presentation
Optique presentationOptique presentation
Optique presentationDBOnto
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Robert Grossman
 
Using parallel hierarchical clustering to
Using parallel hierarchical clustering toUsing parallel hierarchical clustering to
Using parallel hierarchical clustering toBiniam Behailu
 
A Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing CostsA Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing CostsDatabricks
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceRobert Grossman
 
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...EarthCube
 
ArrayUDF: User-Defined Scientific Data Analysis on Arrays
ArrayUDF: User-Defined Scientific Data Analysis on ArraysArrayUDF: User-Defined Scientific Data Analysis on Arrays
ArrayUDF: User-Defined Scientific Data Analysis on ArraysGoon83
 
Research Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataResearch Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataRicard de la Vega
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineeringinside-BigData.com
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
 
Deep galaxy classification of galaxies based on deep convolutional neural ne...
Deep galaxy  classification of galaxies based on deep convolutional neural ne...Deep galaxy  classification of galaxies based on deep convolutional neural ne...
Deep galaxy classification of galaxies based on deep convolutional neural ne...Aboul Ella Hassanien
 

La actualidad más candente (20)

Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big Data
 
Integrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloudIntegrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloud
 
Optique presentation
Optique presentationOptique presentation
Optique presentation
 
STDCS
STDCSSTDCS
STDCS
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)
 
Using parallel hierarchical clustering to
Using parallel hierarchical clustering toUsing parallel hierarchical clustering to
Using parallel hierarchical clustering to
 
Deep Learning in Deep Space
Deep Learning in Deep SpaceDeep Learning in Deep Space
Deep Learning in Deep Space
 
A Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing CostsA Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing Costs
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
 
ArrayUDF: User-Defined Scientific Data Analysis on Arrays
ArrayUDF: User-Defined Scientific Data Analysis on ArraysArrayUDF: User-Defined Scientific Data Analysis on Arrays
ArrayUDF: User-Defined Scientific Data Analysis on Arrays
 
Research Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataResearch Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories Metadata
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
Big data
Big dataBig data
Big data
 
NASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & EngineeringNASA Advanced Computing Environment for Science & Engineering
NASA Advanced Computing Environment for Science & Engineering
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
 
Deep galaxy classification of galaxies based on deep convolutional neural ne...
Deep galaxy  classification of galaxies based on deep convolutional neural ne...Deep galaxy  classification of galaxies based on deep convolutional neural ne...
Deep galaxy classification of galaxies based on deep convolutional neural ne...
 
2019 swan-cs3
2019 swan-cs32019 swan-cs3
2019 swan-cs3
 

Similar a Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li

Kubernetes The New Research Platform
Kubernetes The New Research PlatformKubernetes The New Research Platform
Kubernetes The New Research PlatformBob Killen
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...confluent
 
OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017Stacy Véronneau
 
The Pacific Research Platform 18 Months In
The Pacific Research Platform 18 Months In The Pacific Research Platform 18 Months In
The Pacific Research Platform 18 Months In Larry Smarr
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Wavesinside-BigData.com
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionIan Foster
 
Managing research data at Bristol
Managing research data at BristolManaging research data at Bristol
Managing research data at BristolSimon Price
 
PRP, CHASE-CI, TNRP and OSG
PRP, CHASE-CI, TNRP and OSGPRP, CHASE-CI, TNRP and OSG
PRP, CHASE-CI, TNRP and OSGLarry Smarr
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioAlluxio, Inc.
 
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022Monitoring Exascale Supercomputers With Tim Osborne | Current 2022
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022HostedbyConfluent
 
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019Jonathan Singer
 
Frank Würthwein - NRP and the Path forward
Frank Würthwein - NRP and the Path forwardFrank Würthwein - NRP and the Path forward
Frank Würthwein - NRP and the Path forwardLarry Smarr
 
Big Data for Big Discoveries
Big Data for Big DiscoveriesBig Data for Big Discoveries
Big Data for Big DiscoveriesGovnet Events
 
MIT/CSAIL OpenStack Use Cases - Hong Kong 2014
MIT/CSAIL OpenStack Use Cases - Hong Kong 2014MIT/CSAIL OpenStack Use Cases - Hong Kong 2014
MIT/CSAIL OpenStack Use Cases - Hong Kong 2014Jonathan Proulx
 
Larry Smarr - NRP Application Drivers
Larry Smarr - NRP Application DriversLarry Smarr - NRP Application Drivers
Larry Smarr - NRP Application DriversLarry Smarr
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIinside-BigData.com
 
Scientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitGanesan Narayanasamy
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light SourcesIan Foster
 
Making DMPs actionable and public
Making DMPs actionable and publicMaking DMPs actionable and public
Making DMPs actionable and publicStephanie Simms
 

Similar a Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li (20)

Kubernetes The New Research Platform
Kubernetes The New Research PlatformKubernetes The New Research Platform
Kubernetes The New Research Platform
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
 
OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017
 
The Pacific Research Platform 18 Months In
The Pacific Research Platform 18 Months In The Pacific Research Platform 18 Months In
The Pacific Research Platform 18 Months In
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Waves
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
Managing research data at Bristol
Managing research data at BristolManaging research data at Bristol
Managing research data at Bristol
 
PRP, CHASE-CI, TNRP and OSG
PRP, CHASE-CI, TNRP and OSGPRP, CHASE-CI, TNRP and OSG
PRP, CHASE-CI, TNRP and OSG
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
 
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022Monitoring Exascale Supercomputers With Tim Osborne | Current 2022
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022
 
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
 
Frank Würthwein - NRP and the Path forward
Frank Würthwein - NRP and the Path forwardFrank Würthwein - NRP and the Path forward
Frank Würthwein - NRP and the Path forward
 
Big Data for Big Discoveries
Big Data for Big DiscoveriesBig Data for Big Discoveries
Big Data for Big Discoveries
 
MIT/CSAIL OpenStack Use Cases - Hong Kong 2014
MIT/CSAIL OpenStack Use Cases - Hong Kong 2014MIT/CSAIL OpenStack Use Cases - Hong Kong 2014
MIT/CSAIL OpenStack Use Cases - Hong Kong 2014
 
Larry Smarr - NRP Application Drivers
Larry Smarr - NRP Application DriversLarry Smarr - NRP Application Drivers
Larry Smarr - NRP Application Drivers
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
 
AI Super computer update
AI Super computer update AI Super computer update
AI Super computer update
 
Scientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on Summit
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Making DMPs actionable and public
Making DMPs actionable and publicMaking DMPs actionable and public
Making DMPs actionable and public
 

Más de PacificResearchPlatform

Hybridizing Kubernetes and HPC securely - Pavan Gupta
Hybridizing Kubernetes and HPC securely - Pavan GuptaHybridizing Kubernetes and HPC securely - Pavan Gupta
Hybridizing Kubernetes and HPC securely - Pavan GuptaPacificResearchPlatform
 
Panel 3: Security and Privacy in Practice
Panel 3: Security and Privacy in PracticePanel 3: Security and Privacy in Practice
Panel 3: Security and Privacy in PracticePacificResearchPlatform
 
Managing Complexity in a World of Surprise David L. Alderson, PhD
Managing Complexity in a World of Surprise David L. Alderson, PhDManaging Complexity in a World of Surprise David L. Alderson, PhD
Managing Complexity in a World of Surprise David L. Alderson, PhDPacificResearchPlatform
 
TIPPSS for Enabling & Securing our Increasingly Connected World – Trust, Iden...
TIPPSS for Enabling & Securing our Increasingly Connected World – Trust, Iden...TIPPSS for Enabling & Securing our Increasingly Connected World – Trust, Iden...
TIPPSS for Enabling & Securing our Increasingly Connected World – Trust, Iden...PacificResearchPlatform
 
Securing Research Data: A Workshop on Emerging Practices in Computation and S...
Securing Research Data: A Workshop on Emerging Practices in Computation and S...Securing Research Data: A Workshop on Emerging Practices in Computation and S...
Securing Research Data: A Workshop on Emerging Practices in Computation and S...PacificResearchPlatform
 
Autoencoding RNN for Inference on Unevenly Sampled Time-series data - Josh Bloom
Autoencoding RNN for Inference on Unevenly Sampled Time-series data - Josh BloomAutoencoding RNN for Inference on Unevenly Sampled Time-series data - Josh Bloom
Autoencoding RNN for Inference on Unevenly Sampled Time-series data - Josh BloomPacificResearchPlatform
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardNERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardPacificResearchPlatform
 
RESEARCH CYBERINFRASTRUCTURE Jeff Weekly
RESEARCH CYBERINFRASTRUCTURE Jeff WeeklyRESEARCH CYBERINFRASTRUCTURE Jeff Weekly
RESEARCH CYBERINFRASTRUCTURE Jeff WeeklyPacificResearchPlatform
 
Deep Learning Applied to Galaxy Evolution: Identifying and Characterizing Sta...
Deep Learning Applied to Galaxy Evolution: Identifying and Characterizing Sta...Deep Learning Applied to Galaxy Evolution: Identifying and Characterizing Sta...
Deep Learning Applied to Galaxy Evolution: Identifying and Characterizing Sta...PacificResearchPlatform
 
Fast and Automated Analysis of Interferometric Images of Strong Gravitational...
Fast and Automated Analysis of Interferometric Images of Strong Gravitational...Fast and Automated Analysis of Interferometric Images of Strong Gravitational...
Fast and Automated Analysis of Interferometric Images of Strong Gravitational...PacificResearchPlatform
 
Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska
Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska
Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska PacificResearchPlatform
 

Más de PacificResearchPlatform (14)

Hybridizing Kubernetes and HPC securely - Pavan Gupta
Hybridizing Kubernetes and HPC securely - Pavan GuptaHybridizing Kubernetes and HPC securely - Pavan Gupta
Hybridizing Kubernetes and HPC securely - Pavan Gupta
 
Panel 3: Security and Privacy in Practice
Panel 3: Security and Privacy in PracticePanel 3: Security and Privacy in Practice
Panel 3: Security and Privacy in Practice
 
Managing Complexity in a World of Surprise David L. Alderson, PhD
Managing Complexity in a World of Surprise David L. Alderson, PhDManaging Complexity in a World of Surprise David L. Alderson, PhD
Managing Complexity in a World of Surprise David L. Alderson, PhD
 
TIPPSS for Enabling & Securing our Increasingly Connected World – Trust, Iden...
TIPPSS for Enabling & Securing our Increasingly Connected World – Trust, Iden...TIPPSS for Enabling & Securing our Increasingly Connected World – Trust, Iden...
TIPPSS for Enabling & Securing our Increasingly Connected World – Trust, Iden...
 
Securing Research Data - David Rusting
Securing Research Data - David RustingSecuring Research Data - David Rusting
Securing Research Data - David Rusting
 
Securing Research Data: A Workshop on Emerging Practices in Computation and S...
Securing Research Data: A Workshop on Emerging Practices in Computation and S...Securing Research Data: A Workshop on Emerging Practices in Computation and S...
Securing Research Data: A Workshop on Emerging Practices in Computation and S...
 
PRP Distributed Kubernetes Cluster
PRP Distributed Kubernetes ClusterPRP Distributed Kubernetes Cluster
PRP Distributed Kubernetes Cluster
 
AstroGANS - David Reiman
AstroGANS - David ReimanAstroGANS - David Reiman
AstroGANS - David Reiman
 
Autoencoding RNN for Inference on Unevenly Sampled Time-series data - Josh Bloom
Autoencoding RNN for Inference on Unevenly Sampled Time-series data - Josh BloomAutoencoding RNN for Inference on Unevenly Sampled Time-series data - Josh Bloom
Autoencoding RNN for Inference on Unevenly Sampled Time-series data - Josh Bloom
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardNERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie Bard
 
RESEARCH CYBERINFRASTRUCTURE Jeff Weekly
RESEARCH CYBERINFRASTRUCTURE Jeff WeeklyRESEARCH CYBERINFRASTRUCTURE Jeff Weekly
RESEARCH CYBERINFRASTRUCTURE Jeff Weekly
 
Deep Learning Applied to Galaxy Evolution: Identifying and Characterizing Sta...
Deep Learning Applied to Galaxy Evolution: Identifying and Characterizing Sta...Deep Learning Applied to Galaxy Evolution: Identifying and Characterizing Sta...
Deep Learning Applied to Galaxy Evolution: Identifying and Characterizing Sta...
 
Fast and Automated Analysis of Interferometric Images of Strong Gravitational...
Fast and Automated Analysis of Interferometric Images of Strong Gravitational...Fast and Automated Analysis of Interferometric Images of Strong Gravitational...
Fast and Automated Analysis of Interferometric Images of Strong Gravitational...
 
Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska
Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska
Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska
 

Último

Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxuniversity
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 

Último (20)

Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 

Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li

  • 1. Stanford/SLAC Cryo-EM Computing and Storage Pacific Research Platform Workshop, UCSC SV Yee-Ting Li, September 2018
  • 2. ● Stanford/SLAC Cryo-EM ○ Joint initiative between Stanford School of Medicine and SLAC Lab. ○ Led by Wah Chiu (formerly of BCM) ○ Data taking started ~Jan 2018 ○ 3 Krio’s (2 with GIFs) and 1 Arctica ■ All with Gatan K2 cameras; upgrades to K3 start next month! ■ Users: NIH U24, NIH P41, Stanford and SLAC ● S2C2: Stanford-SLAC Cryo-EM Center ○ New NIH U24 Award ○ Will add 4+ microscopes within the next 3 years; 2 already on order ● Me: ○ Plan, manage and operate all data management, computation, and software infrastructure for all Cryo-EM Facilities ○ Day Job: Work with all SLAC science to support their computational and storage requirements (LSST, Fermi, LCLS I/II, CDMS, ATLAS) 2 Introduction Stanford/SLAC will provide World Class expertise and training for Cryo-EM
  • 3. 3 Architectural Overview Similar architecture to LCLS and other Data-centric Scientific Experiments Onsite Offsite - User Institutions (Universities, other labs etc) Onsite SLAC - Petascale Data Reduction Pipeline Online Monitoring ~2 Gb/s Fast feedback storage Up to 0.267 GB/s Detector (TEM) Offline storage Petascale HPC Offline storage Terascale HPC Fast Feedback ~ seconds ~ 2 min N/A ~ days Pre-processing Reconstruction 2D + 3D + Refinement x4
  • 4. ● Just on TEM2 ○ 35 active proposals ○ >100 experiments ● Mostly limited by ○ Scope downtime ○ Screening time ○ Managerial efficiencies ● Typical Experiment ○ Single Particle: ~6,000 images, total of 4-8TB ○ Tomography: 10-20 tomograms, total of 1-2TB 4 Activity Thus Far... Every increasing data rates
  • 5. 5 Technological Tenets of Data Management Provide users quick feedback on sample and data quality Data Pipelines & Data Provenance eLogBook & Monitoring/Reporting Near Real Time Feedback & Data Analysis
  • 6. 6 User Focused “Is my sample viable”, “Can I get answers from my time on the ‘scope” ● Provide rough gauge of sample, image and data quality in (near) real-time ● Provide automated processing and feedback on ○ sample previews (remote access) ○ Initial pre-processing: ■ Movie alignment (ie MotionCor) ■ CTF calculations (ie ctffind/gctf) ■ Initial automated particle picking (ie dogpicker/gautomatch) ○ Soon: ■ Initial 2D class averages ■ Initial 3D density map ● Provide data management, computational resources and software support for users ○ GPU resources for relion, cryosparc etc. ○ Storage of metadata (logbook), raw data and initial data products
  • 7. ● Opensource framework for ETL ● Rich python-based pluggable architecture ● Integrated into our LSF, TSDB and GPFS environment ● Full accounting and reporting of pipeline ● Horizontally scalable; deployed as containers ● Easy to use GUI; flexible CLI ● Each pipeline defined as Graphs (DAG) 7 Pipelines: Managed with Apache Airflow Don’t reinvent the wheel; add mud-flaps
  • 8. ● Live pre-processing preview sent to dedicated SLACK ● eLogbook ○ Provide centralised platform to access, view, annotate and process their data ○ Provides management reports etc. 8 eLogBook & Monitoring/Reporting Reporting for both Users and Management
  • 9. 9 Infrastructure/Resource Predictions Significant ramp up of (GPU) Compute and Storage Currently 0-5 years * 10 years * CPU Compute 0.0 PFLOPS 0.0 PFLOPS 0.0 PFLOPS GPU 1.0 PFLOPS 10.0 PFLOPS 50.0 PFLOPS Disk Storage 1 PB 5 PB 10 PB Tape Storage 0 5 PB $ 10 PB $ Racks 1 6 10 Numbers assume single-particle analysis only. Tomography requirements TBD (assumed similar) * Preliminary $ Heavily dependent upon agreed NIH data retention policies
  • 10. ● Funding agencies want to know how much time/effort/storage/compute time something takes; as well as how many papers can be published ● Experimentalists do not want to become experts in hardware and software ● Horizontally scalable hardware and software solutions required ○ No SPOF; burstability; off-site/cloud ● Hot/Cold/Warm data ○ How big is the active data set? How long do they need the data around for? Do duplicates exist? ○ Use of HSM and/or cloud storage ● Data/Experiment portability ○ Not just rsync of data ○ Provenance of workflow and data: reproducibility of results - containers! ● Experimental metadata ○ microscope parameters + sample prep details ○ Metadata and data catalogue ● Down in the weeds: ○ Inodes! Best move to object stores… software support? 10 Remarks

Notas del editor

  1. 2GB per minute PER MICROSCOPE Ie 0.250 Gb/s or 0.033 GB/s data rates What is show is the 8 microscopes (4 existing + 4 on new U24 grant) Data Reduction Pipeline NOT APPLICABLE currently, potential to use ML/AI to veto ‘bad’ images - but likely to happen at Fast Feedback layer. Fast Feedback and Offline current same tier; will implement as cache layer rather than physically separate layer.
  2. Provide experimenters a sense of how the experiment is going
  3. Data Retention assumed low (maybe 4 months only); most data will be Wah Chiu Lab Group data only. Data derived from https://docs.google.com/spreadsheets/d/1QCM9x7Q6u_haqeIl4vUc9g2AvP80Kr0yVEnjm1gibfE/edit#gid=0