SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
PETRAIII/EuXFEL data archiving
Sergey Yakubov, Martin Gasthuber (@desy.de) / DESY-IT
Geneva, June 5, 2019
Page 2| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
(National)
Page 3
DESY Campus Hamburg – much more communities
Synchrotron radiation source (highest brilliance)
VUV & soft-x-ray free-electron laser
MPI-SD
FLASH I+II
PETRA III
+
X-Ray Free-Electron Laser
atomic structure & fs dynamics
of complex matter
CHyN
HARBOR
CXNS
NanoLab
CWS
Page 4
sources of data
• 3 active accelerators on-site (all photon science) – Petra III, FLASH and EuXFEL
• currently 30 active experimental areas (called beamlines) - operated in parallel
• more in preparation
• Petra IV (future) – expect 104-5
more (raw) data - not all to be stored
• FLASH21+
• majority of generated data is analyzed within a few months (cooling afterwards)
• have two independent copies asap (raw & calibration data i.e. for EuXFEL)
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, May 2019
Page 5
DESY datacenter - resources interacting with ARCHIVER
data processing resources before archiving
• HPC cluster – 500 nodes, 30,000 cores, large InfiniBand fabric (growing)
• GPFS – 30 building blocks, 30PB, all InfiniBand connected (growing)
• BeeGFS - 3PB, InfiniBand connected
• LHC computing - Analysis Facility + Tier-2, 1000 nodes, 30,000 cores (growing)
• ~40% more resources outside the datacenter (mostly at experimental areas)
current archiving capabilities
• dCache - 6 large instances, 35PB capacity, >120 building blocks, Tape gateway
• Tape – 2 x SL8500 (15000 Slots), 25 x LTO8, 8 x LTO6, >80PB capacity
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
Page 6
data life cycle as of today - from the cradle to the grave
• new archive service connected to ‘Core-FS’ and/or after dCache to fit seamlessly into existing workflow
• this scenario will most likely use the full automated (API/CLI) archive system interface
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
Page 7
PETRAIII/EuXFEL data archiving
• end user workflows (3)
• scientific data and user
• admin workflow
• service integration & planning
• configuration based on site+community data policy and contracts between
● SIP == DIP (AIP should allow sequential media efficiently)
● Archival Storage - here is the ‘hybrid’ in
○ replication (horizontal)
○ multi tiering (vertical) - similar to HSMs
○ instances should run on distributes sites
● Archival Storage == instances of bit-stream-preservation
● Data Management + Ingest + Access == core of archive instance
Page 8
end user workflow 1
• individual scientist archiving important work (i.e. publication, partial analysis results, …) – DOI required
• key metrics
• Single archive size: average 10-100 GB.
• Files in archive: average 10,000
• Total archive size per user: 5 TB
• Duration: 5-10 years
• Ingest rates: 10-100 MB/s (more is better)
• encryption: not required, nice to have
• browser based interaction (authentication, data transfers, metadata query/ingest)
• cli tools usable for data ingest
• metadata query
• starting from a single string input (like Google search) - interactive/immediate selection response
• change QOS - i.e. #replications after re-evaluating ‘value’ of that data
• DOI generated - (like i.e. zenodo) for durable external references
• mobile devices (tablet, phone, …) (tools + protocols) should not excluded
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
individual scientist – managing private scientific data (on its own generated and managed)
Page 9
end user workflow 2
• beamline (experimental station) specific + experiment specific, medium size and rate
• key size parameters
• Single archive size: average 5 TB
• Files in archive: average 150,000
• Total archive size per beamline: 400 TB, doubles every year
• Duration: 10 years
• Ingest rates: 1-2GB/s
• encryption: no required
• 3’rd party copy - ‘gather’ all data from various primary storage systems - controlled from single point
• local (to site) data transport should be RDMA based and operate (efficiently) on networks faster than 10Gbs
• data encryption in transit not required
• API + CLI for seamless automation - i.e. API manifested as Rest-API
• CLI on Linux, API should support used platforms (focus on Linux but incl. Windows ;-)
• MetaData
• other methods (i.e. referencing/finding through experiment managing services) used in addition
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
beamline manager – mix of automated and experiment specific/manual archive interaction
Page 10
end user workflow 3
• large collaboration or site managing and controlling archive operations on behalf of (all experiments) - all automated and
large scale
• all inherited from previous workflow - except the manual part - all interactions automated
• key size parameters
• Single archive size: average 400 TB.
• Files in archive: average 25,000
• Total archive size per beamline: 10s PB, doubles every year
• Duration: 10 years
• Ingest rates: 3-10GB/s - averaged over 1-3 hours
• encryption: not required
• bulk recall - planned re-analysis require bulk restore operation with decent rates (50% of ingest rate) (feed the compute engine)
• async notification from archive on reaching certain states (i.e. data accepted and stored) to be updated in external DBs
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
Integrated data archiving for large standardized beamline/facility experiments
Page 11
site manager & administrative workflows
• create and config core archive and related bit-stream-preservation instances
• based on site and community data policies + contracts with community
• create ‘archive profiles’ determining operation modes and limits (all what could generate costs ;-)
• i.e. this includes tradeoffs between costs and data resiliency (probability of data loss)
• select appropriate ‘bit-stream-preservation’ instances and hierarchy among them (i.e. replication)
• setup further admin and end user accounts and their roles (authorizations)
• delegation of limited admin tasks by group admins of community/groups
• configure/setup AAI - i.e. local IDP
• wide range of authentication methods usable (beside local site ones) – x509, OpenID, eduGAIN, … - more is
better
used to ‘authenticate’ and to be usable in ‘ACL’ like authorization settings (the identity or DN)
• multiple authentication mapped to single ‘identity’
• setup role based model (identity select roles select archive profile)
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
integration, setup and control - workflow derived requirements
Page 12
site manager & administrative workflows
• deployment scenarios (instance architectures)
• deploy main services and esp. metadata store/query (Data Management+Ingest+Access in OAIS
speech)
• locally
• in cloud (using remote service and storage/handling hardware for MD operations)
• create/attach bit stream preservation layer (Archival Storage in OAIS speech)
• local only
• remote only
• tiered - local and remote (i.e. remote tape) - remote could be ‘cooperating lab’, public cloud, …
• (streaming) protocol to transfer data between tiers should support efficient and secure ‘wide
area’ transfers
• Deployment based on open standards / open source version preferrable
• avoid vendor lock-in, assure long-term viability, benefit from wide community support
• subscribing to paid support not excluded
• commercial version not excluded as well (depending on the licencing model, exit strategy, etc.)
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
Deployment models/business models
Page 13
left over…
• life cycle of archive objects (not bound to a single access session) - create, fill (meta)data, close - data
becomes immutable, query
• archive objects could be related to existing ones - i.e. containing new versions of derived data
• all data access should be ‘stream’ based
• no random access (within a file) is required
• recalls of pre-selected files out of single archive object
• network protocol ‘firewall friendly) - i.e. http* based
• Billing
• any ‘non-local’ deployment requires billing services and methods (obvious) seperated in service and storage costs
(at least)
• external storage resource - long term predictable costs/contracts preferred (less ‘pay as you go’)
• per user and group billing (user may be member of several groups and groups might be nested)
• encryption - in all cases is ‘nice to have’ - expecting issues with local ‘key management’ services
• pre and post en/decryption of data in motion and/or at rest is a valid alternative
• (Meta)Data formats
• no special (known to the archive service) data formats required, thus no format conversions (without user
interaction) required
• Metadata, needs ‘exportable/importable’ to new/updated instances
• Metadata - query engine should handle binary, strings, integer and date/time
| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
other thoughts, requirements and options

Más contenido relacionado

La actualidad más candente

HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board  HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board Helix Nebula The Science Cloud
 
Archiver 3rd omc_project_overview
Archiver 3rd omc_project_overviewArchiver 3rd omc_project_overview
Archiver 3rd omc_project_overviewArchiver
 
3 archiver omc deployment_scenarios
3 archiver omc deployment_scenarios3 archiver omc deployment_scenarios
3 archiver omc deployment_scenariosArchiver
 
The GoGeo Vision for Repositories (Pecha Kucha) - Tony Mathys
The GoGeo Vision for Repositories (Pecha Kucha) - Tony MathysThe GoGeo Vision for Repositories (Pecha Kucha) - Tony Mathys
The GoGeo Vision for Repositories (Pecha Kucha) - Tony MathysRepository Fringe
 
Exposing EO Linked (meta-)Data from OpenSearch Catalogue
Exposing EO Linked (meta-)Data from OpenSearch CatalogueExposing EO Linked (meta-)Data from OpenSearch Catalogue
Exposing EO Linked (meta-)Data from OpenSearch CatalogueRaul Palma
 
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoring
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoringBde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoring
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoringBigData_Europe
 

La actualidad más candente (20)

The Archiver project
The Archiver projectThe Archiver project
The Archiver project
 
The Science Cloud Users: Challenges and Needs
The Science Cloud Users: Challenges and NeedsThe Science Cloud Users: Challenges and Needs
The Science Cloud Users: Challenges and Needs
 
Sharing Big Data - Bob Jones
Sharing Big Data - Bob JonesSharing Big Data - Bob Jones
Sharing Big Data - Bob Jones
 
HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board  HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board
 
Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]
 
Archiver 3rd omc_project_overview
Archiver 3rd omc_project_overviewArchiver 3rd omc_project_overview
Archiver 3rd omc_project_overview
 
Geoservices Activities at EDINA
Geoservices Activities at EDINAGeoservices Activities at EDINA
Geoservices Activities at EDINA
 
3 archiver omc deployment_scenarios
3 archiver omc deployment_scenarios3 archiver omc deployment_scenarios
3 archiver omc deployment_scenarios
 
Open Access Repository Junction
Open Access Repository JunctionOpen Access Repository Junction
Open Access Repository Junction
 
Fedora Oxford Dec09
Fedora Oxford Dec09Fedora Oxford Dec09
Fedora Oxford Dec09
 
UK RepositoryNet+ Mimas Workshop
UK RepositoryNet+ Mimas WorkshopUK RepositoryNet+ Mimas Workshop
UK RepositoryNet+ Mimas Workshop
 
Open @ EDINA
Open @ EDINAOpen @ EDINA
Open @ EDINA
 
COBWEB: Brief Introduction, GBIF Secretariat
COBWEB: Brief Introduction, GBIF SecretariatCOBWEB: Brief Introduction, GBIF Secretariat
COBWEB: Brief Introduction, GBIF Secretariat
 
Crowdsourcing the Past with AddressingHistory
Crowdsourcing the Past with AddressingHistory Crowdsourcing the Past with AddressingHistory
Crowdsourcing the Past with AddressingHistory
 
SafeNet: Progress and Data Gathering
SafeNet: Progress and Data GatheringSafeNet: Progress and Data Gathering
SafeNet: Progress and Data Gathering
 
The GoGeo Vision for Repositories (Pecha Kucha) - Tony Mathys
The GoGeo Vision for Repositories (Pecha Kucha) - Tony MathysThe GoGeo Vision for Repositories (Pecha Kucha) - Tony Mathys
The GoGeo Vision for Repositories (Pecha Kucha) - Tony Mathys
 
Exposing EO Linked (meta-)Data from OpenSearch Catalogue
Exposing EO Linked (meta-)Data from OpenSearch CatalogueExposing EO Linked (meta-)Data from OpenSearch Catalogue
Exposing EO Linked (meta-)Data from OpenSearch Catalogue
 
The European Open Science Cloud
The European Open Science CloudThe European Open Science Cloud
The European Open Science Cloud
 
Metadata Working Group - Status update
Metadata Working Group -Status updateMetadata Working Group -Status update
Metadata Working Group - Status update
 
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoring
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoringBde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoring
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoring
 

Similar a DESY / XFEL Deployment Scenarios

Managing research data at Bristol
Managing research data at BristolManaging research data at Bristol
Managing research data at BristolSimon Price
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformGlobus
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...t_ivanov
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202Timothy Spann
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSandeep Patil
 
Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Trishali Nayar
 
Geospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataGeospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataAlexMiowski
 
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...IOSR Journals
 
HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge Meinhard
HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge MeinhardHNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge Meinhard
HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge MeinhardHelix Nebula The Science Cloud
 
Google File System
Google File SystemGoogle File System
Google File Systemvivatechijri
 
CS403: Operating System : Lec 4 OS services.pptx
CS403: Operating System : Lec 4 OS services.pptxCS403: Operating System : Lec 4 OS services.pptx
CS403: Operating System : Lec 4 OS services.pptxAsst.prof M.Gokilavani
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobus
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...David Wallom
 

Similar a DESY / XFEL Deployment Scenarios (20)

Managing research data at Bristol
Managing research data at BristolManaging research data at Bristol
Managing research data at Bristol
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
 
Legislation.gov.uk
Legislation.gov.ukLegislation.gov.uk
Legislation.gov.uk
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN Caching
 
Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...
 
Geospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataGeospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning Data
 
H017144148
H017144148H017144148
H017144148
 
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
 
191
191191
191
 
HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge Meinhard
HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge MeinhardHNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge Meinhard
HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge Meinhard
 
Google File System
Google File SystemGoogle File System
Google File System
 
EUDAT B2STAGE & EOSC-hub
EUDAT B2STAGE & EOSC-hubEUDAT B2STAGE & EOSC-hub
EUDAT B2STAGE & EOSC-hub
 
CS403: Operating System : Lec 4 OS services.pptx
CS403: Operating System : Lec 4 OS services.pptxCS403: Operating System : Lec 4 OS services.pptx
CS403: Operating System : Lec 4 OS services.pptx
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
 

Más de Archiver

Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver
 
Prototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyPrototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyArchiver
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and CeremonyArchiver
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver
 
Wrapping Up and Next Steps¶
Wrapping Up and Next Steps¶Wrapping Up and Next Steps¶
Wrapping Up and Next Steps¶Archiver
 
ARCHIVER Tender Requirements
ARCHIVER Tender RequirementsARCHIVER Tender Requirements
ARCHIVER Tender RequirementsArchiver
 
Wrapping up and_next_steps_stansted
Wrapping up and_next_steps_stanstedWrapping up and_next_steps_stansted
Wrapping up and_next_steps_stanstedArchiver
 
20190523 archiver fim
20190523 archiver fim20190523 archiver fim
20190523 archiver fimArchiver
 
Geant cloud peering-v2
Geant cloud peering-v2Geant cloud peering-v2
Geant cloud peering-v2Archiver
 
Archiver omc stansted_tendering_procedure_and_requirements_final
Archiver omc stansted_tendering_procedure_and_requirements_finalArchiver omc stansted_tendering_procedure_and_requirements_final
Archiver omc stansted_tendering_procedure_and_requirements_finalArchiver
 
Wrapping up_and_next_steps
Wrapping up_and_next_stepsWrapping up_and_next_steps
Wrapping up_and_next_stepsArchiver
 
Introduction to_planning_poker_addestino
Introduction to_planning_poker_addestinoIntroduction to_planning_poker_addestino
Introduction to_planning_poker_addestinoArchiver
 
Archiver 2nd_OMC event_Barcelona_Project Overview
Archiver 2nd_OMC event_Barcelona_Project OverviewArchiver 2nd_OMC event_Barcelona_Project Overview
Archiver 2nd_OMC event_Barcelona_Project OverviewArchiver
 
Archiver OMC event_Barcelona_ Welcome to_accio
Archiver OMC event_Barcelona_ Welcome to_accio Archiver OMC event_Barcelona_ Welcome to_accio
Archiver OMC event_Barcelona_ Welcome to_accio Archiver
 
6 presentation wrapping up and next steps v2
6 presentation wrapping up and next steps v26 presentation wrapping up and next steps v2
6 presentation wrapping up and next steps v2Archiver
 
5 introduction to geant
5 introduction to geant5 introduction to geant
5 introduction to geantArchiver
 
4 archiver omc session 1
4 archiver omc session 1 4 archiver omc session 1
4 archiver omc session 1 Archiver
 
2 procurement and legal aspects
2 procurement and legal aspects 2 procurement and legal aspects
2 procurement and legal aspects Archiver
 
1 archiver omc project_overview
1 archiver omc project_overview1 archiver omc project_overview
1 archiver omc project_overviewArchiver
 

Más de Archiver (20)

Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
Prototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyPrototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and Ceremony
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and Ceremony
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
 
Wrapping Up and Next Steps¶
Wrapping Up and Next Steps¶Wrapping Up and Next Steps¶
Wrapping Up and Next Steps¶
 
ARCHIVER Tender Requirements
ARCHIVER Tender RequirementsARCHIVER Tender Requirements
ARCHIVER Tender Requirements
 
Wrapping up and_next_steps_stansted
Wrapping up and_next_steps_stanstedWrapping up and_next_steps_stansted
Wrapping up and_next_steps_stansted
 
20190523 archiver fim
20190523 archiver fim20190523 archiver fim
20190523 archiver fim
 
Geant cloud peering-v2
Geant cloud peering-v2Geant cloud peering-v2
Geant cloud peering-v2
 
Archiver omc stansted_tendering_procedure_and_requirements_final
Archiver omc stansted_tendering_procedure_and_requirements_finalArchiver omc stansted_tendering_procedure_and_requirements_final
Archiver omc stansted_tendering_procedure_and_requirements_final
 
Wrapping up_and_next_steps
Wrapping up_and_next_stepsWrapping up_and_next_steps
Wrapping up_and_next_steps
 
Introduction to_planning_poker_addestino
Introduction to_planning_poker_addestinoIntroduction to_planning_poker_addestino
Introduction to_planning_poker_addestino
 
Archiver 2nd_OMC event_Barcelona_Project Overview
Archiver 2nd_OMC event_Barcelona_Project OverviewArchiver 2nd_OMC event_Barcelona_Project Overview
Archiver 2nd_OMC event_Barcelona_Project Overview
 
Archiver OMC event_Barcelona_ Welcome to_accio
Archiver OMC event_Barcelona_ Welcome to_accio Archiver OMC event_Barcelona_ Welcome to_accio
Archiver OMC event_Barcelona_ Welcome to_accio
 
6 presentation wrapping up and next steps v2
6 presentation wrapping up and next steps v26 presentation wrapping up and next steps v2
6 presentation wrapping up and next steps v2
 
5 introduction to geant
5 introduction to geant5 introduction to geant
5 introduction to geant
 
4 archiver omc session 1
4 archiver omc session 1 4 archiver omc session 1
4 archiver omc session 1
 
2 procurement and legal aspects
2 procurement and legal aspects 2 procurement and legal aspects
2 procurement and legal aspects
 
1 archiver omc project_overview
1 archiver omc project_overview1 archiver omc project_overview
1 archiver omc project_overview
 

Último

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 

Último (20)

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 

DESY / XFEL Deployment Scenarios

  • 1. PETRAIII/EuXFEL data archiving Sergey Yakubov, Martin Gasthuber (@desy.de) / DESY-IT Geneva, June 5, 2019
  • 2. Page 2| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019 (National)
  • 3. Page 3 DESY Campus Hamburg – much more communities Synchrotron radiation source (highest brilliance) VUV & soft-x-ray free-electron laser MPI-SD FLASH I+II PETRA III + X-Ray Free-Electron Laser atomic structure & fs dynamics of complex matter CHyN HARBOR CXNS NanoLab CWS
  • 4. Page 4 sources of data • 3 active accelerators on-site (all photon science) – Petra III, FLASH and EuXFEL • currently 30 active experimental areas (called beamlines) - operated in parallel • more in preparation • Petra IV (future) – expect 104-5 more (raw) data - not all to be stored • FLASH21+ • majority of generated data is analyzed within a few months (cooling afterwards) • have two independent copies asap (raw & calibration data i.e. for EuXFEL) | PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, May 2019
  • 5. Page 5 DESY datacenter - resources interacting with ARCHIVER data processing resources before archiving • HPC cluster – 500 nodes, 30,000 cores, large InfiniBand fabric (growing) • GPFS – 30 building blocks, 30PB, all InfiniBand connected (growing) • BeeGFS - 3PB, InfiniBand connected • LHC computing - Analysis Facility + Tier-2, 1000 nodes, 30,000 cores (growing) • ~40% more resources outside the datacenter (mostly at experimental areas) current archiving capabilities • dCache - 6 large instances, 35PB capacity, >120 building blocks, Tape gateway • Tape – 2 x SL8500 (15000 Slots), 25 x LTO8, 8 x LTO6, >80PB capacity | PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
  • 6. Page 6 data life cycle as of today - from the cradle to the grave • new archive service connected to ‘Core-FS’ and/or after dCache to fit seamlessly into existing workflow • this scenario will most likely use the full automated (API/CLI) archive system interface | PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019
  • 7. Page 7 PETRAIII/EuXFEL data archiving • end user workflows (3) • scientific data and user • admin workflow • service integration & planning • configuration based on site+community data policy and contracts between ● SIP == DIP (AIP should allow sequential media efficiently) ● Archival Storage - here is the ‘hybrid’ in ○ replication (horizontal) ○ multi tiering (vertical) - similar to HSMs ○ instances should run on distributes sites ● Archival Storage == instances of bit-stream-preservation ● Data Management + Ingest + Access == core of archive instance
  • 8. Page 8 end user workflow 1 • individual scientist archiving important work (i.e. publication, partial analysis results, …) – DOI required • key metrics • Single archive size: average 10-100 GB. • Files in archive: average 10,000 • Total archive size per user: 5 TB • Duration: 5-10 years • Ingest rates: 10-100 MB/s (more is better) • encryption: not required, nice to have • browser based interaction (authentication, data transfers, metadata query/ingest) • cli tools usable for data ingest • metadata query • starting from a single string input (like Google search) - interactive/immediate selection response • change QOS - i.e. #replications after re-evaluating ‘value’ of that data • DOI generated - (like i.e. zenodo) for durable external references • mobile devices (tablet, phone, …) (tools + protocols) should not excluded | PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019 individual scientist – managing private scientific data (on its own generated and managed)
  • 9. Page 9 end user workflow 2 • beamline (experimental station) specific + experiment specific, medium size and rate • key size parameters • Single archive size: average 5 TB • Files in archive: average 150,000 • Total archive size per beamline: 400 TB, doubles every year • Duration: 10 years • Ingest rates: 1-2GB/s • encryption: no required • 3’rd party copy - ‘gather’ all data from various primary storage systems - controlled from single point • local (to site) data transport should be RDMA based and operate (efficiently) on networks faster than 10Gbs • data encryption in transit not required • API + CLI for seamless automation - i.e. API manifested as Rest-API • CLI on Linux, API should support used platforms (focus on Linux but incl. Windows ;-) • MetaData • other methods (i.e. referencing/finding through experiment managing services) used in addition | PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019 beamline manager – mix of automated and experiment specific/manual archive interaction
  • 10. Page 10 end user workflow 3 • large collaboration or site managing and controlling archive operations on behalf of (all experiments) - all automated and large scale • all inherited from previous workflow - except the manual part - all interactions automated • key size parameters • Single archive size: average 400 TB. • Files in archive: average 25,000 • Total archive size per beamline: 10s PB, doubles every year • Duration: 10 years • Ingest rates: 3-10GB/s - averaged over 1-3 hours • encryption: not required • bulk recall - planned re-analysis require bulk restore operation with decent rates (50% of ingest rate) (feed the compute engine) • async notification from archive on reaching certain states (i.e. data accepted and stored) to be updated in external DBs | PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019 Integrated data archiving for large standardized beamline/facility experiments
  • 11. Page 11 site manager & administrative workflows • create and config core archive and related bit-stream-preservation instances • based on site and community data policies + contracts with community • create ‘archive profiles’ determining operation modes and limits (all what could generate costs ;-) • i.e. this includes tradeoffs between costs and data resiliency (probability of data loss) • select appropriate ‘bit-stream-preservation’ instances and hierarchy among them (i.e. replication) • setup further admin and end user accounts and their roles (authorizations) • delegation of limited admin tasks by group admins of community/groups • configure/setup AAI - i.e. local IDP • wide range of authentication methods usable (beside local site ones) – x509, OpenID, eduGAIN, … - more is better used to ‘authenticate’ and to be usable in ‘ACL’ like authorization settings (the identity or DN) • multiple authentication mapped to single ‘identity’ • setup role based model (identity select roles select archive profile) | PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019 integration, setup and control - workflow derived requirements
  • 12. Page 12 site manager & administrative workflows • deployment scenarios (instance architectures) • deploy main services and esp. metadata store/query (Data Management+Ingest+Access in OAIS speech) • locally • in cloud (using remote service and storage/handling hardware for MD operations) • create/attach bit stream preservation layer (Archival Storage in OAIS speech) • local only • remote only • tiered - local and remote (i.e. remote tape) - remote could be ‘cooperating lab’, public cloud, … • (streaming) protocol to transfer data between tiers should support efficient and secure ‘wide area’ transfers • Deployment based on open standards / open source version preferrable • avoid vendor lock-in, assure long-term viability, benefit from wide community support • subscribing to paid support not excluded • commercial version not excluded as well (depending on the licencing model, exit strategy, etc.) | PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019 Deployment models/business models
  • 13. Page 13 left over… • life cycle of archive objects (not bound to a single access session) - create, fill (meta)data, close - data becomes immutable, query • archive objects could be related to existing ones - i.e. containing new versions of derived data • all data access should be ‘stream’ based • no random access (within a file) is required • recalls of pre-selected files out of single archive object • network protocol ‘firewall friendly) - i.e. http* based • Billing • any ‘non-local’ deployment requires billing services and methods (obvious) seperated in service and storage costs (at least) • external storage resource - long term predictable costs/contracts preferred (less ‘pay as you go’) • per user and group billing (user may be member of several groups and groups might be nested) • encryption - in all cases is ‘nice to have’ - expecting issues with local ‘key management’ services • pre and post en/decryption of data in motion and/or at rest is a valid alternative • (Meta)Data formats • no special (known to the archive service) data formats required, thus no format conversions (without user interaction) required • Metadata, needs ‘exportable/importable’ to new/updated instances • Metadata - query engine should handle binary, strings, integer and date/time | PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019 other thoughts, requirements and options