SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
CERN Deployments Scenarios
Technical Details
Evangelos Motesnitsalis
Technical Coordinator
ARCHIVER Open Market Consultation Event
23 May 2019, London Stansted Airport
23 May 2019 http://www.archiver-project.eu 2
Contents
Introduction to High Energy Physics Deployment Scenarios
The BaBar Experiment
CERN Digital Memory
CERN Open Data
Volumes, Ingest Rates, and Retention Period
Summary and Next Steps
Introduction to
High Energy Physics
Deployment Scenarios
23 May 2019 http://www.archiver-project.eu 4
Introduction to HEP Deployment Scenarios
In all three Deployment Scenarios, users do not need to have access
directly to the Archiving Service
The volume of data is between 1.5 to 2 PBs for each Deployment
Scenario
In all three Deployment Scenarios, data need to be recalled within a
“reasonable time window” (<24h)
23 May 2019 http://www.archiver-project.eu 5
OAIS Reference Model
Relevant Standards
Preservation: ISO 14721/16393, 26324 and related standards
Storage/Basic Archiving/Secure backup: ISO 27000, 27040, 19086
23 May 2019 http://www.archiver-project.eu 6
FAIR Principles
Findable
AccessibleInteroperable
Re-Usable
• Accurate and relevant description
• Data usage license and detailed
provenance
• Retrievable with free protocols
• Accessible metadata even after
deletion
• Global, unique identifiers
• Rich Metadata, indexes, search
capabilities
• Qualified reference to other data
• Formal, shared and broadly applicable
knowledge representation standards
https://www.go-fair.org/
23 May 2019 http://www.archiver-project.eu 7
High Energy Physics Deployment Scenarios
The BaBar Experiment
CERN Digital Memory
CERN Open Data
The BaBar Experiment
The BaBar Experiment – Problem Definition
23 May 2019 http://www.archiver-project.eu 9
In 2020 the BaBar Experiment infrastructure at SLAC will
be decommissioned. As a result, the 2 PB of BaBar data
can no longer be stored at the host laboratory and
alternative solutions need to be found. Currently a copy
of the data is being held by CERN IT.
We want to ensure that a complete copy of Babar data
will be retained for possible comparisons with data from
other experiments.
The BaBar Experiment –Workflow Characteristics
23 May 2019 http://www.archiver-project.eu 10
The Service Manager [SM] will access the Archiving Service
The SM will trigger the data ingestion
The SM should have the ability to do “partial recalls”:
• On a file
• On a subset of a file
The SM should have the ability update the data
Data will be rarely recalled
Personal data do not exist in this use case
The cost is estimated to be below 100K per year [50K per PB per year]
The BaBar Experiment – Interface Needs
23 May 2019 http://www.archiver-project.eu 11
Basic API functionalities that enables:
Ingestion/retrieval of data
Getting fixity checks
• automate reporting of fixity and errors
• an anti-corruption mechanism every time the data is touched
Restart capabilities due to high volume of data
CERN Digital Memory
CERN Digital Memory – Problem Definition
23 May 2019 http://www.archiver-project.eu 13
We want to archive the ~1.5 PB of CERN
Digital Memory, containing digitized analog
documents produced by the institution in the
20th century as well as the digital production
of the 21st century, including new types like
web sites, social medias, emails, etc.
CERN Digital Memory – Workflow Characteristics
23 May 2019 http://www.archiver-project.eu 14
The Service Manager [SM] will access to the Archiving Service
The SM will trigger the data ingestion
The SM should have the ability to do “partial recalls”:
• On a file
• On a subset of a file
e.g. download only one photo out of an album or only one part of a video recording
The SM should have the ability update the data
e.g. replace/delete only one photograph in an album
Data will be rarely recalled
Personal data do exist in this use case
CERN Digital Memory – Data Characteristics
23 May 2019 http://www.archiver-project.eu 15
Currently the CERN Digital Memory is fragmented in various
information systems and different storage solution which are not OAIS
compliant
There are no universal standards for the contents
We want to introduce specific standards and formats in order to
ensure long-term preservation
The existence of personal and confidential data increases the
complexity of the user access requirements for this scenario
e.g. the service manager should not have access to the audio file of a CERN Council Meeting
CERN Digital Memory – Interface Needs
23 May 2019 http://www.archiver-project.eu 16
API functionalities:
Automated SIP transfers
Automated metadata handling
Access to converted files and checksums
Detailed Error information
Web Interface:
Dashboard with browsing/searching capabilities
An audit log where details of all actions can be accessed
CERN Open Data
CERN Open Data
23 May 2019 http://www.archiver-project.eu 18
The CERN Open Data portal disseminates close to 2 PBs of primary
and derived datasets from partical physics as they were released by
LHC Collaborations and is being used for both education and research
purposes. The CERN Open Data Service Managers seek an easy-to-
use, easy-to-achieve independent archiving and backup for its
holdingse based on SIPs [Submission Information Packages] with
intelligent and reliable disaster recovery mechanisms.
CERN Open Data – Workflow Characteristics
23 May 2019 http://www.archiver-project.eu 19
The Service Manager [SM] will access to the Archiving Service
The SM will trigger the data ingestion
The SM should have the ability to do “partial recalls”:
• On a file
• On subset of a file
The SM should have the ability update the data
e.g. replace/delete only one file of a dataset
Data will be rarely recalled
Personal data do not exist in this case
Data ingestion is based on “release campaings” (3x / year)
Data are publicly available – they can even be crawled
CERN Open Data – Data Characteristics
23 May 2019 http://www.archiver-project.eu 20
The CERN Open Data Portal contains:
10.000 bibliographical records
600.000 files
2 PB in total
Typical dataset size: ~3 TB
Typical File Size: 1-4 GB
Metadata in custom JSON Schema
inspired by W3C DCAT Standard
CERN Open Data – Interface Characteristics
23 May 2019 http://www.archiver-project.eu 21
API functionalities:
Automated transfers (e.g. HTTP)
Automated metadata handling
Validation of the integrity of the deposited material both for data and
metadata
Periodic fixity checks
Web Interface:
Dashboard with browing/searching capabilities
An audit log where details of all actions can be accessed
CERN Open Data – added value features
23 May 2019 http://www.archiver-project.eu 22
The CernVM File System provides a scalable and reliable software
distribution service for the LHC experiments as a POSIX read-only file system.
Files and directories are hosted on standard web servers and mounted in the universal
namespace /cvmfs.
As CernVM-FS can use S3 protocol for storage, we want to explore two
possibilities:
The first is to install CernVM-FS in external infrastructure
The second is to transfer CernVM-FS in an external service (for example, cvmfs.cloud.com)
This service will be added on top of the archiving solution as a Software
Reproducability Layer, in order to run example Physics analyses using non-
CERN/LHC infrastructure.
Volumes, Ingest Rates, and
Retention Period
Dataset Characteristics
Deployment Scenario Data Volumes
CERN Digital Memory 1.4 PB
The BaBar Experiment 2 PB
CERN Open Data 2+ PB
23 May 2019 http://www.archiver-project.eu 24
Deployment Scenario Retention Period
CERN Digital Memory 10+ years
The BaBar Experiment 10+ years
CERN Open Data 10+ years
Deployment Scenario Ingest Rates
CERN Digital Memory 1 GB/s
The BaBar Experiment 1 GB/s
CERN Open Data 1 GB/s – 10 GB/s
Overview
23 May 2019 http://www.archiver-project.eu 25
CERN Digital Memory
The BaBar Experiment
CERN Open Data
Summary
23 May 2019 http://www.archiver-project.eu 27
Summary and Next Steps
The primary goal for the CERN Deployment Scenarios is the preservation and long-term archiving of data.
However, all the scenarios would benefit greatly from an added Software Reproducability Layer on top of the
archiving solution.
These deployment scenarios have many similarities but they also exhibit important differences that make each
one unique.
e.g. Personal data for CERN Digital Memory
We welcome your feedback on the draft of the “Functional Specifications” documents which have been released
on the project website
At the next OMC Event in CERN, we are going to present the first version of the test plan which will be
co-designed and co-developed by the Buyers Group and the Suppliers
The plan will be based on the outcome of the Design Phase, the Functional Specifications document, and the
Deployment Scenarios needs
The test assessment will be a deciding factor to qualify solutions to the subsequent phases
The tests will focus on basic functionality capabilities during the prototype phase and performance, efficiency, and
scalability during the pilot phase

Más contenido relacionado

La actualidad más candente

Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and CeremonyArchiver
 
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Archiver
 
Prototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyPrototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyArchiver
 
HNSciCloud Introduction - Bob Jones - Prototype Phase kickoff meeting
HNSciCloud Introduction - Bob Jones - Prototype Phase kickoff meetingHNSciCloud Introduction - Bob Jones - Prototype Phase kickoff meeting
HNSciCloud Introduction - Bob Jones - Prototype Phase kickoff meetingHelix Nebula The Science Cloud
 
Science DMZ
Science DMZScience DMZ
Science DMZJisc
 
Enabling efficient movement of data into & out of a high-performance analysis...
Enabling efficient movement of data into & out of a high-performance analysis...Enabling efficient movement of data into & out of a high-performance analysis...
Enabling efficient movement of data into & out of a high-performance analysis...Jisc
 
PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)Josep Flix
 
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...BigData_Europe
 
HNSciCloud: Project Results and lessons learned
HNSciCloud: Project Results and lessons learnedHNSciCloud: Project Results and lessons learned
HNSciCloud: Project Results and lessons learnedEOSC-hub project
 

La actualidad más candente (20)

Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and Ceremony
 
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
 
Prototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyPrototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and Ceremony
 
HNSciCloud Introduction - Bob Jones - Prototype Phase kickoff meeting
HNSciCloud Introduction - Bob Jones - Prototype Phase kickoff meetingHNSciCloud Introduction - Bob Jones - Prototype Phase kickoff meeting
HNSciCloud Introduction - Bob Jones - Prototype Phase kickoff meeting
 
Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]
 
Open Access Repository Junction
Open Access Repository JunctionOpen Access Repository Junction
Open Access Repository Junction
 
UK RepositoryNet+ Mimas Workshop
UK RepositoryNet+ Mimas WorkshopUK RepositoryNet+ Mimas Workshop
UK RepositoryNet+ Mimas Workshop
 
Science DMZ
Science DMZScience DMZ
Science DMZ
 
Enabling efficient movement of data into & out of a high-performance analysis...
Enabling efficient movement of data into & out of a high-performance analysis...Enabling efficient movement of data into & out of a high-performance analysis...
Enabling efficient movement of data into & out of a high-performance analysis...
 
Geoservices Activities at EDINA
Geoservices Activities at EDINAGeoservices Activities at EDINA
Geoservices Activities at EDINA
 
Helix Nebula - The Science Cloud - Lessons learned
Helix Nebula - The Science Cloud - Lessons learned Helix Nebula - The Science Cloud - Lessons learned
Helix Nebula - The Science Cloud - Lessons learned
 
HNSciCloud Prototype Phase Award - Marc-Elian Begin
HNSciCloud Prototype Phase Award - Marc-Elian Begin HNSciCloud Prototype Phase Award - Marc-Elian Begin
HNSciCloud Prototype Phase Award - Marc-Elian Begin
 
HNSciCloud Status Update
HNSciCloud Status UpdateHNSciCloud Status Update
HNSciCloud Status Update
 
PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)
 
EUDAT B2STAGE & EOSC-hub
EUDAT B2STAGE & EOSC-hubEUDAT B2STAGE & EOSC-hub
EUDAT B2STAGE & EOSC-hub
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
 
Crowdsourcing the Past with AddressingHistory
Crowdsourcing the Past with AddressingHistory Crowdsourcing the Past with AddressingHistory
Crowdsourcing the Past with AddressingHistory
 
Big data in Digimap
Big data in DigimapBig data in Digimap
Big data in Digimap
 
HNSciCloud: Project Results and lessons learned
HNSciCloud: Project Results and lessons learnedHNSciCloud: Project Results and lessons learned
HNSciCloud: Project Results and lessons learned
 

Similar a Archiver omc cern_deployment_scenarios_technical_details

Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver
 
Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...
Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...
Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...confluent
 
Time -Travel on the Internet
Time -Travel on the InternetTime -Travel on the Internet
Time -Travel on the InternetIRJET Journal
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3Tim Bell
 
Archiver 3rd omc_project_overview
Archiver 3rd omc_project_overviewArchiver 3rd omc_project_overview
Archiver 3rd omc_project_overviewArchiver
 
Building earth observation applications with NextGEOSS - webinar
Building earth observation applications with NextGEOSS - webinarBuilding earth observation applications with NextGEOSS - webinar
Building earth observation applications with NextGEOSS - webinarterradue
 
Archiving data from Durham to RAL using the File Transfer Service (FTS)
Archiving data from Durham to RAL using the File Transfer Service (FTS)Archiving data from Durham to RAL using the File Transfer Service (FTS)
Archiving data from Durham to RAL using the File Transfer Service (FTS)Jisc
 
Ensuring Continuing Access to Online Scholarly Resources Stewardship & Servic...
Ensuring Continuing Access to Online Scholarly Resources Stewardship & Servic...Ensuring Continuing Access to Online Scholarly Resources Stewardship & Servic...
Ensuring Continuing Access to Online Scholarly Resources Stewardship & Servic...EDINA, University of Edinburgh
 
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...IRJET Journal
 
Presentation mongo db munich
Presentation mongo db munichPresentation mongo db munich
Presentation mongo db munichMongoDB
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...DataWorks Summit/Hadoop Summit
 
NetApp &amp; SharePoint Pro Connections Webinar
NetApp &amp; SharePoint Pro Connections WebinarNetApp &amp; SharePoint Pro Connections Webinar
NetApp &amp; SharePoint Pro Connections Webinariamrob
 
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoring
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoringBde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoring
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoringBigData_Europe
 
BigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigData_Europe
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobus
 
ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13Simeon Warner
 
HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board  HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board Helix Nebula The Science Cloud
 

Similar a Archiver omc cern_deployment_scenarios_technical_details (20)

Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
 
Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...
Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...
Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...
 
Time -Travel on the Internet
Time -Travel on the InternetTime -Travel on the Internet
Time -Travel on the Internet
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3
 
Archiver 3rd omc_project_overview
Archiver 3rd omc_project_overviewArchiver 3rd omc_project_overview
Archiver 3rd omc_project_overview
 
IR-AUDIT
IR-AUDITIR-AUDIT
IR-AUDIT
 
Building earth observation applications with NextGEOSS - webinar
Building earth observation applications with NextGEOSS - webinarBuilding earth observation applications with NextGEOSS - webinar
Building earth observation applications with NextGEOSS - webinar
 
Archiving data from Durham to RAL using the File Transfer Service (FTS)
Archiving data from Durham to RAL using the File Transfer Service (FTS)Archiving data from Durham to RAL using the File Transfer Service (FTS)
Archiving data from Durham to RAL using the File Transfer Service (FTS)
 
Ensuring Continuing Access to Online Scholarly Resources Stewardship & Servic...
Ensuring Continuing Access to Online Scholarly Resources Stewardship & Servic...Ensuring Continuing Access to Online Scholarly Resources Stewardship & Servic...
Ensuring Continuing Access to Online Scholarly Resources Stewardship & Servic...
 
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
 
Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?
 
Presentation mongo db munich
Presentation mongo db munichPresentation mongo db munich
Presentation mongo db munich
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
 
NetApp &amp; SharePoint Pro Connections Webinar
NetApp &amp; SharePoint Pro Connections WebinarNetApp &amp; SharePoint Pro Connections Webinar
NetApp &amp; SharePoint Pro Connections Webinar
 
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoring
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoringBde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoring
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoring
 
BigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal Pilots
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
 
ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13
 
HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board  HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board
 
SyMBA: Overview
SyMBA: OverviewSyMBA: Overview
SyMBA: Overview
 

Más de Archiver

Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver
 
Wrapping Up and Next Steps¶
Wrapping Up and Next Steps¶Wrapping Up and Next Steps¶
Wrapping Up and Next Steps¶Archiver
 
Overview of the EOSC¶
Overview of the EOSC¶Overview of the EOSC¶
Overview of the EOSC¶Archiver
 
ARCHIVER Tender Requirements
ARCHIVER Tender RequirementsARCHIVER Tender Requirements
ARCHIVER Tender RequirementsArchiver
 
Wrapping up and_next_steps_stansted
Wrapping up and_next_steps_stanstedWrapping up and_next_steps_stansted
Wrapping up and_next_steps_stanstedArchiver
 
20190523 archiver fim
20190523 archiver fim20190523 archiver fim
20190523 archiver fimArchiver
 
Geant cloud peering-v2
Geant cloud peering-v2Geant cloud peering-v2
Geant cloud peering-v2Archiver
 
Archiver omc stansted_tendering_procedure_and_requirements_final
Archiver omc stansted_tendering_procedure_and_requirements_finalArchiver omc stansted_tendering_procedure_and_requirements_final
Archiver omc stansted_tendering_procedure_and_requirements_finalArchiver
 
Wrapping up_and_next_steps
Wrapping up_and_next_stepsWrapping up_and_next_steps
Wrapping up_and_next_stepsArchiver
 
Introduction to_planning_poker_addestino
Introduction to_planning_poker_addestinoIntroduction to_planning_poker_addestino
Introduction to_planning_poker_addestinoArchiver
 
Archiver 2nd_OMC event_Barcelona_Project Overview
Archiver 2nd_OMC event_Barcelona_Project OverviewArchiver 2nd_OMC event_Barcelona_Project Overview
Archiver 2nd_OMC event_Barcelona_Project OverviewArchiver
 
Archiver OMC event_Barcelona_ Welcome to_accio
Archiver OMC event_Barcelona_ Welcome to_accio Archiver OMC event_Barcelona_ Welcome to_accio
Archiver OMC event_Barcelona_ Welcome to_accio Archiver
 
6 presentation wrapping up and next steps v2
6 presentation wrapping up and next steps v26 presentation wrapping up and next steps v2
6 presentation wrapping up and next steps v2Archiver
 
5 introduction to geant
5 introduction to geant5 introduction to geant
5 introduction to geantArchiver
 
4 archiver omc session 1
4 archiver omc session 1 4 archiver omc session 1
4 archiver omc session 1 Archiver
 
2 procurement and legal aspects
2 procurement and legal aspects 2 procurement and legal aspects
2 procurement and legal aspects Archiver
 
1 archiver omc project_overview
1 archiver omc project_overview1 archiver omc project_overview
1 archiver omc project_overviewArchiver
 

Más de Archiver (18)

Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
Wrapping Up and Next Steps¶
Wrapping Up and Next Steps¶Wrapping Up and Next Steps¶
Wrapping Up and Next Steps¶
 
Overview of the EOSC¶
Overview of the EOSC¶Overview of the EOSC¶
Overview of the EOSC¶
 
ARCHIVER Tender Requirements
ARCHIVER Tender RequirementsARCHIVER Tender Requirements
ARCHIVER Tender Requirements
 
Wrapping up and_next_steps_stansted
Wrapping up and_next_steps_stanstedWrapping up and_next_steps_stansted
Wrapping up and_next_steps_stansted
 
20190523 archiver fim
20190523 archiver fim20190523 archiver fim
20190523 archiver fim
 
Geant cloud peering-v2
Geant cloud peering-v2Geant cloud peering-v2
Geant cloud peering-v2
 
Archiver omc stansted_tendering_procedure_and_requirements_final
Archiver omc stansted_tendering_procedure_and_requirements_finalArchiver omc stansted_tendering_procedure_and_requirements_final
Archiver omc stansted_tendering_procedure_and_requirements_final
 
Wrapping up_and_next_steps
Wrapping up_and_next_stepsWrapping up_and_next_steps
Wrapping up_and_next_steps
 
Introduction to_planning_poker_addestino
Introduction to_planning_poker_addestinoIntroduction to_planning_poker_addestino
Introduction to_planning_poker_addestino
 
Archiver 2nd_OMC event_Barcelona_Project Overview
Archiver 2nd_OMC event_Barcelona_Project OverviewArchiver 2nd_OMC event_Barcelona_Project Overview
Archiver 2nd_OMC event_Barcelona_Project Overview
 
Archiver OMC event_Barcelona_ Welcome to_accio
Archiver OMC event_Barcelona_ Welcome to_accio Archiver OMC event_Barcelona_ Welcome to_accio
Archiver OMC event_Barcelona_ Welcome to_accio
 
6 presentation wrapping up and next steps v2
6 presentation wrapping up and next steps v26 presentation wrapping up and next steps v2
6 presentation wrapping up and next steps v2
 
5 introduction to geant
5 introduction to geant5 introduction to geant
5 introduction to geant
 
4 archiver omc session 1
4 archiver omc session 1 4 archiver omc session 1
4 archiver omc session 1
 
2 procurement and legal aspects
2 procurement and legal aspects 2 procurement and legal aspects
2 procurement and legal aspects
 
1 archiver omc project_overview
1 archiver omc project_overview1 archiver omc project_overview
1 archiver omc project_overview
 

Último

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 

Último (20)

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 

Archiver omc cern_deployment_scenarios_technical_details

  • 1. CERN Deployments Scenarios Technical Details Evangelos Motesnitsalis Technical Coordinator ARCHIVER Open Market Consultation Event 23 May 2019, London Stansted Airport
  • 2. 23 May 2019 http://www.archiver-project.eu 2 Contents Introduction to High Energy Physics Deployment Scenarios The BaBar Experiment CERN Digital Memory CERN Open Data Volumes, Ingest Rates, and Retention Period Summary and Next Steps
  • 3. Introduction to High Energy Physics Deployment Scenarios
  • 4. 23 May 2019 http://www.archiver-project.eu 4 Introduction to HEP Deployment Scenarios In all three Deployment Scenarios, users do not need to have access directly to the Archiving Service The volume of data is between 1.5 to 2 PBs for each Deployment Scenario In all three Deployment Scenarios, data need to be recalled within a “reasonable time window” (<24h)
  • 5. 23 May 2019 http://www.archiver-project.eu 5 OAIS Reference Model Relevant Standards Preservation: ISO 14721/16393, 26324 and related standards Storage/Basic Archiving/Secure backup: ISO 27000, 27040, 19086
  • 6. 23 May 2019 http://www.archiver-project.eu 6 FAIR Principles Findable AccessibleInteroperable Re-Usable • Accurate and relevant description • Data usage license and detailed provenance • Retrievable with free protocols • Accessible metadata even after deletion • Global, unique identifiers • Rich Metadata, indexes, search capabilities • Qualified reference to other data • Formal, shared and broadly applicable knowledge representation standards https://www.go-fair.org/
  • 7. 23 May 2019 http://www.archiver-project.eu 7 High Energy Physics Deployment Scenarios The BaBar Experiment CERN Digital Memory CERN Open Data
  • 9. The BaBar Experiment – Problem Definition 23 May 2019 http://www.archiver-project.eu 9 In 2020 the BaBar Experiment infrastructure at SLAC will be decommissioned. As a result, the 2 PB of BaBar data can no longer be stored at the host laboratory and alternative solutions need to be found. Currently a copy of the data is being held by CERN IT. We want to ensure that a complete copy of Babar data will be retained for possible comparisons with data from other experiments.
  • 10. The BaBar Experiment –Workflow Characteristics 23 May 2019 http://www.archiver-project.eu 10 The Service Manager [SM] will access the Archiving Service The SM will trigger the data ingestion The SM should have the ability to do “partial recalls”: • On a file • On a subset of a file The SM should have the ability update the data Data will be rarely recalled Personal data do not exist in this use case The cost is estimated to be below 100K per year [50K per PB per year]
  • 11. The BaBar Experiment – Interface Needs 23 May 2019 http://www.archiver-project.eu 11 Basic API functionalities that enables: Ingestion/retrieval of data Getting fixity checks • automate reporting of fixity and errors • an anti-corruption mechanism every time the data is touched Restart capabilities due to high volume of data
  • 13. CERN Digital Memory – Problem Definition 23 May 2019 http://www.archiver-project.eu 13 We want to archive the ~1.5 PB of CERN Digital Memory, containing digitized analog documents produced by the institution in the 20th century as well as the digital production of the 21st century, including new types like web sites, social medias, emails, etc.
  • 14. CERN Digital Memory – Workflow Characteristics 23 May 2019 http://www.archiver-project.eu 14 The Service Manager [SM] will access to the Archiving Service The SM will trigger the data ingestion The SM should have the ability to do “partial recalls”: • On a file • On a subset of a file e.g. download only one photo out of an album or only one part of a video recording The SM should have the ability update the data e.g. replace/delete only one photograph in an album Data will be rarely recalled Personal data do exist in this use case
  • 15. CERN Digital Memory – Data Characteristics 23 May 2019 http://www.archiver-project.eu 15 Currently the CERN Digital Memory is fragmented in various information systems and different storage solution which are not OAIS compliant There are no universal standards for the contents We want to introduce specific standards and formats in order to ensure long-term preservation The existence of personal and confidential data increases the complexity of the user access requirements for this scenario e.g. the service manager should not have access to the audio file of a CERN Council Meeting
  • 16. CERN Digital Memory – Interface Needs 23 May 2019 http://www.archiver-project.eu 16 API functionalities: Automated SIP transfers Automated metadata handling Access to converted files and checksums Detailed Error information Web Interface: Dashboard with browsing/searching capabilities An audit log where details of all actions can be accessed
  • 18. CERN Open Data 23 May 2019 http://www.archiver-project.eu 18 The CERN Open Data portal disseminates close to 2 PBs of primary and derived datasets from partical physics as they were released by LHC Collaborations and is being used for both education and research purposes. The CERN Open Data Service Managers seek an easy-to- use, easy-to-achieve independent archiving and backup for its holdingse based on SIPs [Submission Information Packages] with intelligent and reliable disaster recovery mechanisms.
  • 19. CERN Open Data – Workflow Characteristics 23 May 2019 http://www.archiver-project.eu 19 The Service Manager [SM] will access to the Archiving Service The SM will trigger the data ingestion The SM should have the ability to do “partial recalls”: • On a file • On subset of a file The SM should have the ability update the data e.g. replace/delete only one file of a dataset Data will be rarely recalled Personal data do not exist in this case Data ingestion is based on “release campaings” (3x / year) Data are publicly available – they can even be crawled
  • 20. CERN Open Data – Data Characteristics 23 May 2019 http://www.archiver-project.eu 20 The CERN Open Data Portal contains: 10.000 bibliographical records 600.000 files 2 PB in total Typical dataset size: ~3 TB Typical File Size: 1-4 GB Metadata in custom JSON Schema inspired by W3C DCAT Standard
  • 21. CERN Open Data – Interface Characteristics 23 May 2019 http://www.archiver-project.eu 21 API functionalities: Automated transfers (e.g. HTTP) Automated metadata handling Validation of the integrity of the deposited material both for data and metadata Periodic fixity checks Web Interface: Dashboard with browing/searching capabilities An audit log where details of all actions can be accessed
  • 22. CERN Open Data – added value features 23 May 2019 http://www.archiver-project.eu 22 The CernVM File System provides a scalable and reliable software distribution service for the LHC experiments as a POSIX read-only file system. Files and directories are hosted on standard web servers and mounted in the universal namespace /cvmfs. As CernVM-FS can use S3 protocol for storage, we want to explore two possibilities: The first is to install CernVM-FS in external infrastructure The second is to transfer CernVM-FS in an external service (for example, cvmfs.cloud.com) This service will be added on top of the archiving solution as a Software Reproducability Layer, in order to run example Physics analyses using non- CERN/LHC infrastructure.
  • 23. Volumes, Ingest Rates, and Retention Period
  • 24. Dataset Characteristics Deployment Scenario Data Volumes CERN Digital Memory 1.4 PB The BaBar Experiment 2 PB CERN Open Data 2+ PB 23 May 2019 http://www.archiver-project.eu 24 Deployment Scenario Retention Period CERN Digital Memory 10+ years The BaBar Experiment 10+ years CERN Open Data 10+ years Deployment Scenario Ingest Rates CERN Digital Memory 1 GB/s The BaBar Experiment 1 GB/s CERN Open Data 1 GB/s – 10 GB/s
  • 25. Overview 23 May 2019 http://www.archiver-project.eu 25 CERN Digital Memory The BaBar Experiment CERN Open Data
  • 27. 23 May 2019 http://www.archiver-project.eu 27 Summary and Next Steps The primary goal for the CERN Deployment Scenarios is the preservation and long-term archiving of data. However, all the scenarios would benefit greatly from an added Software Reproducability Layer on top of the archiving solution. These deployment scenarios have many similarities but they also exhibit important differences that make each one unique. e.g. Personal data for CERN Digital Memory We welcome your feedback on the draft of the “Functional Specifications” documents which have been released on the project website At the next OMC Event in CERN, we are going to present the first version of the test plan which will be co-designed and co-developed by the Buyers Group and the Suppliers The plan will be based on the outcome of the Design Phase, the Functional Specifications document, and the Deployment Scenarios needs The test assessment will be a deciding factor to qualify solutions to the subsequent phases The tests will focus on basic functionality capabilities during the prototype phase and performance, efficiency, and scalability during the pilot phase