The World Wide Web was invented at CERN in 1991. Construction of CERN's LHC was approved in 1994. Building the data processing system required by LHC's detectors in 1994 would have costed more than the accelerator itself. CERN and data centres from around the world started collaborating in 1999 to prototype and deploy the LHC Computing Grid, the first planetary scale high performance data processing system, which enabled the discovery of the Higgs boson in 2012. A review will be made of these developments, and their relationship to current areas of interest in data processing, such as "Big Data" and digitally supported collaborative science.
Similar a 68th ICREA Colloquium "The Worldwide LHC Computing Grid: Riding the computing technology wave to enable the Higgs boson discovery" by Manuel Delfino
B.sc i cs u 1 operating system concept & computer generationRai University
Similar a 68th ICREA Colloquium "The Worldwide LHC Computing Grid: Riding the computing technology wave to enable the Higgs boson discovery" by Manuel Delfino (20)
Recombination DNA Technology (Nucleic Acid Hybridization )
68th ICREA Colloquium "The Worldwide LHC Computing Grid: Riding the computing technology wave to enable the Higgs boson discovery" by Manuel Delfino
1. PIC
port d’informació
científica
The Worldwide LHC Computing Grid:
Riding the computing technology wave to
enable the Higgs boson discovery.
68th ICREA Colloquium, November 17th
, 2015
Prof. Manuel Delfino
UAB Physics Dept. and IFAE, Director of PIC
2. PIC
port d’informació
científica Outline
● Data Processing in Experimental Particle Physics
● The 1980s: From mainframes to clusters
● The 1990s: From clusters to farms
● The Large Hadron Collider – timeline
● The 2000s: The Worldwide LHC Computing Grid
● Outlook and Conclusions
4. PIC
port d’informació
científica
Data Processing in
Experimental Particle Physics
● Post World War II “High Energy Nuclear Physics”:
Linking mainframe computers, devices and scientists
– Bubble chamber scanning device automation
– Storage of electronic readout data → DAQ and Online computers
– Raw data converted to analysis data on central computer at
laboratories → Offline computers at labs
– Scientists transport tapes and punched cards in their suitcases →
Data distribution
– Scientists attempt to analyse data on their University's central
computer and face many barriers and frustration
● Curiously, many scientific fields today still use this
methodology. Last step less troublesome thanks to PCs.
5. PIC
port d’informació
científica
Data Processing in
Experimental Particle Physics
● 1960s-1970s: Microelectronics + detector innovation
→ more data generated, transported, computed on
– Standard “nuclear” electronics: NIM, CAMAC, VME
– Microprocessors, particularly Motorola 68K, in DAQ
– Minicomputers, particularly PDP-11, as online computers
– Available mainframe capacity becomes totally insufficient
● Pressure on experiments to tighten trigger requirements,
recording less data at the risk of “missing” new physics
● Multiple mainframes at labs
● Very few universities make available “research mainframes”
– 1972: IBM releases VM/370: Virtual machines
– 1977: Digital VAX-11/780 minicomputer: Virtual memory
7. PIC
port d’informació
científica
1980s: From mainframes to clusters
● Microprocessor evolution
– Lower price minicomputers
● Physics departments at major universities have their own VAX
● Minicomputer shrinked into workstation: VAXstation
– Very powerful 32-bit RISC workstations for CAD/CAM and electronics
design: Apollo Computer, Sun Microsystems, IBM, DEC
– 1981: IBM PC using 16-bit Intel 8088 + 16 kB of RAM
– 1984: Apple Macintosh using 16/32-bit Motorola 68K + 128 kB RAM
● Microelectronics evolution: Bit-sliced microprocessor
– SLAC and CERN co-develop IBM emulators
– Souped-up online computers → Software triggers
● UA1 discovers W and Z bosons using a DAQ system built on
Mac+VME and an offline mainframe boosted by emulators
8. PIC
port d’informació
científica
1980s: From mainframes to clusters
●
Local area networks
– Ethernet: Xerox PARC (1974) → 3com (1979)
– Token Ring: Cambridge → IBM Zurich lab → Apollo
●
Academic IBM maxis and VAX minis linked across the world:
BITNET/EARN
●
Clusters: Many computers on a network with
– Common authentication/authorization/access control
– Shared access to resources (disk, network, tape)
– Inter-process communication → “Cluster supercomputer”
●
Operating system software gains importance
●
1983: IBM and DEC fund Project Athena at MIT (70 M$)
9. PIC
port d’informació
científica
1980s: From mainframes to clusters
● 1985-1988: Dig 27 km tunnel for LEP/LHC at CERN
– CERN deploys world's largest token ring for LEP control
● Interactive computing becomes larger than batch
– IBM VM/CMS and Digital VAX/VMS are dominant
– Scientists start exchanging data via BITNET and DECnet
– 1986: CERN develops PAW, the “Excel” of part. Physics
– 1989: Tim Berners-Lee invents the World Wide Web at CERN
– 1993: NCSA Mosaic™ browser with graphic interface
● 1985: Needs of CERN's LEP experiments estimated at
10 times larger than budgeted growth in maxis + minis
● 1989: CERN buys a Cray X-MP, decides to run it with Unix
11. PIC
port d’informació
científica 1990s: From clusters to farms
● Processing data from particle collisions is “embarrasingly parallel”
● 1987: LFAE + Florida State Univ. launch FALCON:
– FALCON I: Quasi-online farm: Raw → Physics objects
Diskless VAXstations + disks dual-ported to DAQ
– FALCON III: Analysis farm: Physics objects→PAW ntuples
VAXstations hosting disks + exploit data locality
– Both based on Local Area VAXcluster over Ethernet
● 1990: CERN launches SHIFT project, based on (then) very
controversial technological choices/goals:
– Unix and TCP/IP
– Heterogeneous RISC workstation hardware:
Apollo, Sun Microsystems, Silicon Graphics
– High-speed network from small California company
– Support multiple experiments. Develop accounting system !!
14. PIC
port d’informació
científica 1990s: From clusters to farms
● 1992: Digital releases Alpha RISC microprocessor
– Capable of running VMS
– FALCON I and III upgraded to Alphas
– FSU builds FALCON IV: Alpha VAXcluster with FDDI net.
Tapes shipped across the Atlantic to use FALCON IV.
● 1995: Digital launches Altavista search engine
● 1994-1998: Digital disintegrates
● Particle physics abandons VMS in favor of Unix
15. PIC
port d’informació
científica 1990s: from clusters to farms
● SHIFT is very sucessful:
– Niche network replaced by TCP/IP over standard Ethernet
– Hardware agnostic→Heavy competition amongst vendors
– Advances in SCSI hard disks allow the delivery of huge
amount of disk space to physicists
– CERN CASTOR Hierarchical Storage Manager makes
tape look like an agile extension of disk
● Similar tendency at Fermilab Tevatron Collider
– Development of the SAM Distributed Data Framework
– Data distribution over network in “push” and “pull” modes
16. PIC
port d’informació
científica 1990s: from clusters to farms
● 1993: Microsoft releases Windows NT
● 1994: NASA uses standard Unix plus MPI/PVM to build first
Beowulf cluster → “cluster supercomputer”
● 1994: Linux Torvalds → Linux kernel v1.0
● 1995: Bill Gates “Internet Tidal wave” → TCP/IP everywhere
● mid-1990s: 32 bit mass market microprocessors
– 1995: Intel Pentium Pro
– 1996: AMD K5
● 1994: CERN buys its last mainframe
– IBM SP-2: Mainframe built from RISC workstations
– CERN decides to manage it in an integrated manner within SHIFT
17. PIC
port d’informació
científica 1990s: from clusters to farms
● 1996: CERN RD-47 project:
High energy physics processing using commodity components
Barcelona-CERN-JINR Dubna-Florida State-Fac. des Sciences Luminy- Santa Cruz-Washington
– Implement the whole particle physics data processing environment
on commodity hardware and software
– Concepts and prototypes of extra tools to automatically manage a
large number of nodes → processor farm
● Two approaches:
– Use Windows NT PCs to replicate VAXcluster environment using
purely commercial components
– Use Linux PCs to implement shift using as much “open” software as
possible
● Some controversy as RISC had moved to 64-bits
18. PIC
port d’informació
científica 1990s: from clusters to farms
● Windows NT approach worked, but >10 years ahead of its
time → Windows Azzure cloud service
● Linux approach became dominant
– Rapid scale-up of power and number of nodes (100s)
– University groups start deploying local clusters
– Development of automation tools: Quattor, Lemon
→ precursors of Puppet, Chef
● 1999: CERN DG triggers the first study of solutions for the
data processing needs of the LHC.
First look generates considerable shock
– > 100k computers needed
– Cost similar to one of the LHC detectors
20. PIC
port d’informació
científica (Brief) LHC timeline
● 1984: European Committee on Future Accelerators workshop: Large
Hadron Collider in the LEP tunnel
● 1987: U.S. President Ronald Reagan announces support for the
Superconducting Supercollider
● 1988: Digging of LEP tunnel completed at CERN
● 1992: Letter of Intent for ATLAS and CMS detectors
● 1993: U.S. Congress kills the SSC after 2 G$ spent
● 1994: CERN Council approves construction of LHC
● 2000: CERN stops LEP, dismantles to house LHC
● 2008: First LHC beams, magnet interconnect accident
● 2009: Beams back in LHC, run 1 starts
● 2012: ATLAS and CMS discover the Higgs boson
● 2015: LHC Run2 starts
21. PIC
port d’informació
científica (Brief) LHC Computing timeline
● 1992-1994: SSC, CERN: resources needed will be “much larger than those of
current facilities”
● 1996-1998: R&D for Online: 1 PB/s → 1-10 PB/year
– LHCb prototype of Myrinet based farm (later used to build first MareNostrum at BSC)
– CMS bets on farm based on giant Ethernet switches
Needed capacity is equivalent to ¼ of US phone traffic
● 1999: First estimate: 0.1 M cores + 100 PB = 200 M€
● 1999: Ian Foster and Carl Kesselman publish
“The Grid: Blueprint for a new computing infrastructure”
● 2000-2001: NSF, DOE, EU fund Grid development
● 2001: Worldwide LHC Computing Grid project approved by CERN Council,
becomes part of the CERN Research Program
● 2002: LCG1 service becomes operational
● 2012: WLCG acknowledged as key enabler for Higgs boson discovery
● 2015: Estimates for High Luminosity LHC: Resources needed are “much higher”
23. PIC
port d’informació
científica
2000s: Worldwide LHC Computing Grid
● Fastest evolution component in computing:
Wide Area fiber optics communication
● Five very differentiated needs:
– Archiving
– Reconstruction
– Filtering
– Analysis
– Simulation
● MONARC study
– wide area network to integrate worldwide resources
– centers with different capabilities and reliabilities
24. PIC
port d’informació
científica
2000s: Worldwide LHC Computing Grid
● The WLCG Tiers as defined in MONARC
– Tier-0 at CERN
● Receive raw data from DAQ, archive to tape and cache on disk
● Run prompt reconstruction and quality checks
● Distribute raw data to a limited number (13) of Tier-1 centers
– Tier-1 (CA, DE, ES, FR, IT, KR, Nordic, NL, RUx2, TW, UK, USx2)
● Receive raw data form CERN, archive to tape and reconstruct
● Receive simulations, archive to tape and reconstruct
● Run filters → physics objects with pre-determined patterns
● Distribute filtered data to many (200) Tier-2 centers
– Tier-2
● Make filtered physics objects available for analysis
● Run simulations
26. PIC
port d’informació
científica
Economies of scale were an important part
of the “Grid vision”
Time
Quality,economiesofscale
Decouple production &
consumption, enabling
●
On-demand access
●
Economies of scale
●
Consumer flexibility
●
New devices
Adapted by permission from Ian Foster, University of Chicago and US Argonne National Lab
An example from the
development of electrical
power from a “cottage
industry” to a dependable
infrastructure
27. PIC
port d’informació
científica
2000s: Worldwide LHC Computing Grid
● Grid “middleware” (example based on European gLite)
– Authentication: International Grid Trust Federation of entities issuing X.509
certificates to users (ES: RedIRIS)
– Authorization: Virtual Organization Management Service (VOMS)→Internet
servers tell your rights within project
– Computing Element: Abstraction of a batch queue
– Storage Element: Abstraction of a disk
– Resource Broker (or Workload Management System):
Broker amongst Computing Elements
– Replica Manager:
Handle multiple copies of same data in different Storage Elements
– Information Service: Where is what ?
– Logging and Book-keeping: What is happening ?
– User Interface: Machine which bridges normal world to the Grid
28. PIC
port d’informació
científica
Foster & Kesselman vision: Grid or Cloud?
Servers:
Execution
Application
Services:
Distribution
Application Virtualization
• Automatically connect
applications to services
• Dynamic & intelligent
provisioning
Infrastructure Virtualization
• Dynamic & intelligent
provisioning
• Automatic failover
Source: The Grid: Blueprint for a New Computing Infrastructure (2nd
Edition), 2004
Applications:
Delivery
30. PIC
port d’informació
científica
2000s: Worldwide LHC Computing Grid
● Large sites had to learn to deal with x100 resources
● Petabyte level storage
– Disk units failing weekly
– Multiple boxes presenting a single namespace:
Distributed file systems: Lustre, GPFS, dCache
– Tape in Tier-1s
● Batch systems with 1000-10000 nodes
● Efficient use of many-core processors
● Traffic shaping in huge LANs. WANs near saturation
● Electrical power consumption and cooling efficiency
● European Tier1s are multi-experiment: good and bad
39. PIC
port d’informació
científica LHC Run4 → Exascale
● Luminosity: x6 Run2, x14 Run3, x120 Run4
● Complexity: More collisions per beam crossing
● Current model will work for Runs 2 and 3, but worries about large
number of sites and associated people
● Need to re-think everything for Run 4 → Start now
40. PIC
port d’informació
científica At PIC: Benefit other projects
● Past and current
– MAGIC Cherenkov Telescope main data center
– PAU Cosmological Survey main data center
– EUCLID Science Data Center supporting Simulation OU
– Analysis support for LHC-ATLAS, neutrinos-T2K, cosmology-DES, etc.
– Storage and analysis support for simulations
● Turbulent flow (Hoyas and Jiménez-Sendín, UPM, published)
● Universe expansion (Fosalba, et.al., IEEC-ICE, published and ongoing)
● Star evolution (Padoan, et.al., ICREA, published and ongoing)
– PIC Neuroimage processing platform
● Future
– Cherenkov Telescope Array (CTA) landing data center
– Next generation neutrino experiments
– Simulation storage and analysis
– Other fields? → Contact me if you want to explore a collaboration