Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
High Energy Physics and ARMv8 64-bit?
Investigating The Future of Computing at CERN
DAVID AbDURACHMANOV (CMS)

Joshua Wyat...
An International Laboratory
CERN
The European Organization for Nuclear Research
Geneva (CH)
CMS Experiment (FR)
CERN Meyrin (CH/FR)
CERN Prévessin (FR...
WHERE ARE OUR SCIENTIFIC TOOLS?
..WE HIDE THEM DEEP UNDER GROUND..
28.7m long

15m DIAMETER
14'000 tons WEIGHT
500+ MCHF (we keep maintaining and upgraDing IT)
40 countries
172 institutes
H...
38 countries
174 institutes
46M long

25M WIDE
25M HIGH
7'000 tons WEIGHT
540+ MCHF (we keep maintaining and upgraDing IT)...
CMS
CMS EVENT DISPLAY -- Higgs boson
Two types of computing resources
CMS Detector (HIGH LEVEL TRIGGER) Worldwide LHC Computing Grid (WLCG)
Full ownership ✓
Si...
THE GRID NEVER SLEEPS..
Peak at 3 PM was
91222.282 MB/s
Distributed computing in HEP before ~2000 had multiple
vendors inv...
WLCG in RUN 2
Global transfer rates increased to > 40 GB/s (=2 x Run1)
Data acquisition >10PB / month
Regular transfers of...
How __it__ works?
> No single job batch submission system
LSF, HTCondor, Slurm, ...
> No single storage solution
NFS, Glus...
HIGH-LUMINOSITY LHC: BEST ESTIMATE
0
100
200
300
400
500
600
700
800
900
1000
Raw Derived
Data	estimates	for	1st	year	of	H...
MOTIVATION FOR ARM in HEP?
> Explore new hardware and software platforms that, in the future, may be more suited to its bu...
SOFTWARE STACKS
open-source @ GITHUB
C++14
Python
Fortran
OS (RHEL/CentOS/SL)
Toolchain
Standard
HEP
Python zlib glibc Ope...
Finally successful execution achieved!
The first AArch64 based WLCG
site (demonstrator)
CMS Dashboard
Task Monitoring
On J...
VALIDATION
Simulation is MoNte carlo PROCESS:
- NUMERICAL IDENTITY is not expected
- but different trends/histogram shapes...
SAME, SINGLE Event: SPOT THE DIFFERENCES...
Intel Xeon AARch64_pROTO
How many ARM CORES (excl. SMARtPHONES)?<2,000 pHYSICAL cores for AARCH64 in 2017 for
porting, benchmarking, optimization, ...
CERN openlab
What is CERN OPENLAB?
"CERN openlab is a unique public-private partnership that accelerates the
development o...
THE DIRECTION OF COMMUNITY
CERN openlab white paper on computing challenges(September 2017)
HEP SOFTWARE FOUNDATION, Commu...
Scavenging Cycles
Cloud
●
Well accounted
●
Spot price market
●
Elasticity
Offload Peaks
HPC
●
Allocation by
grants
●
Backf...
Cloud Computing in High-Energy Physics
Drivers and Obstacles
1 Cost (partially)
2 Control & trust
3 Specialized
applicatio...
Cycles in the Cloud
Source: Gutsche
Cycles in the Cloud
Source: Gutsche
Cloud Resources in a Global Batch System
HEP Site
Cloud
Experiment’s
File Catalog
Experiment’s
Task Queue
Site Storage
Sof...
OpenStack / KVM
CERN OpenStack
∙ 200 k cores (growing)
∙ 3 PB storage
∙ Spans 2 physical
data centers
Our remote data cent...
A HEP Image for Clouds
Twofold system: 𝜇CernVM boot loader + OS delivered by custom network file system (CernVM-FS)
initrd...
Porting CernVM to ARM
initrd: CernVM-FS + 𝜇Contextualisation
Kernel
BootLoader
FuseAUFS
20MB
∙ Re-compile kernel, CernVM-F...
Porting CernVM to ARM
Porting CernVM to ARM
LHC@Home
∙ Serious computing
∙ Workstations & Gaming PCs
∙ > 2 trillion collisions simulated
for CERN theory group
(bigges...
The Future of Volunteer Computing?
The Future of Volunteer Computing?
Revisiting All Areas of Computing in HEP
Thanks for Your Time!
davidlt@cern.ch jblomer@cern.ch joshua.wyatt.smith@cern.ch
Próxima SlideShare
Cargando en…5
×

BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

630 visualizaciones

Publicado el

"Session ID: BUD17-300K2
Session Name: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN - BUD17-300K2
Speaker: Jakob Blomer, David Abdurachmanov
Track:


★ Session Summary ★
Around the year 2000, the convergence on Linux and commodity x86_64 processors provided a homogeneous scientific computing platform which enabled the construction of the Worldwide LHC Computing Grid (WLCG) for LHC data processing. This allowed the High Energy Physics (HEP) community to use a homogeneous software model utilizing the x86_64 architecture. LHC experiments at CERN, in particular ATLAS and CMS, started investigating ARMv8 64-bit (AArch64) architecture for HEP needs. A journey which started in 2013. The LHC community faces a great challenge regarding computing needs in 10 years and has started exploring public clouds, volunteer computing (e.g., LHC@home) and HPC facilities to increase peak computation capacity. This talk will contain information about future (a timeline of 10 years) computation needs for LHC experiments and the more recent progress done by ATLAS, CernVM and CMS teams on using ARMv8 64-bit/AArch64.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/bud17/bud17-300k2/
Presentation: https://www.slideshare.net/linaroorg/bud17300k2-high-energy-physics-and-armv8-64bit-investigating-the-future-of-computing-at-cern
Video: https://www.youtube.com/watch?v=71Yco-mTaYI
---------------------------------------------------

★ Event Details ★
Linaro Connect Budapest 2017 (BUD17)
6-10 March 2017
Corinthia Hotel, Budapest,
Erzsébet krt. 43-49,
1073 Hungary

---------------------------------------------------
Keyword: ARMv8, CERN, physics, 64-bit
http://www.linaro.org
http://connect.linaro.org
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961"

Publicado en: Tecnología
  • Sé el primero en comentar

BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

  1. 1. High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN DAVID AbDURACHMANOV (CMS)
 Joshua Wyatt SmitH (ATLAS)
 Jakob Blomer (CERN)
  2. 2. An International Laboratory
  3. 3. CERN The European Organization for Nuclear Research Geneva (CH) CMS Experiment (FR) CERN Meyrin (CH/FR) CERN Prévessin (FR) Geneva Airport (CH) ATLAS Experiment (CH)
  4. 4. WHERE ARE OUR SCIENTIFIC TOOLS? ..WE HIDE THEM DEEP UNDER GROUND..
  5. 5. 28.7m long
 15m DIAMETER 14'000 tons WEIGHT 500+ MCHF (we keep maintaining and upgraDing IT) 40 countries 172 institutes Hi Higgs boson! Compact Muon Solenoid -- CMS
  6. 6. 38 countries 174 institutes 46M long
 25M WIDE 25M HIGH 7'000 tons WEIGHT 540+ MCHF (we keep maintaining and upgraDing IT) Hi Higgs boson! A Toroidal LHC ApparatuS -- ATLAS
  7. 7. CMS
  8. 8. CMS EVENT DISPLAY -- Higgs boson
  9. 9. Two types of computing resources CMS Detector (HIGH LEVEL TRIGGER) Worldwide LHC Computing Grid (WLCG) Full ownership ✓ Single "customer" ✓ High-bandwidth interconnect ✓ Partially owned Multiple "customers" Bandwidth varies ONLINE COMPUTING FILTERING AND SELECTING DATA FROM DETECTOR ~16K x86_64 cores in 2016 OFFLINE COMPUTING THE DATA IS ALREADY STORED AND CAN BE PROCESSED LATER ~650K x86_64 CORES (changes frequently) CERN has ~200K x86_64 cores will add another 100K in 2017 350 PB DISK 400+ PB TAPE MOVED >800 PB over network in 2016 CAN NOW BE USED as "opportunistic" resource for OFFLINE COMPUTING via OPENSTACK CAN BE done between "RUNS" (~6 hr) and longer technical stops VMS are killed/replaced before detector comes back to record data THE FUTURE LOOKS DIFFERENT..
  10. 10. THE GRID NEVER SLEEPS.. Peak at 3 PM was 91222.282 MB/s Distributed computing in HEP before ~2000 had multiple vendors involved, and incl. special workstations and heterogeneous computing High Throughput Computing (HTC) converged on x86/Linux at ~2000 Commodity hardware enabled the current model of WLCG: Build Once, Run Everywhere This left us with two vendors: - Intel (dominating) - AMD
  11. 11. WLCG in RUN 2 Global transfer rates increased to > 40 GB/s (=2 x Run1) Data acquisition >10PB / month Regular transfers of 80 PB/month with 100 PB/month during July-Aug (many billions of files) OVER >800 PB transferred across WLCG in 2016 2016: 49.4 PB LHC data 58 PB all experiments 73 PB total LHC performance is above expectations: All factors driving
 computing have increased above anticipated levels RUN 2
  12. 12. How __it__ works? > No single job batch submission system LSF, HTCondor, Slurm, ... > No single storage solution NFS, GlusterFS, Hadoop, CUSTOM developed by HEP COMMUNITY > Has 100+ different CPUs from the last 10 years, most are 4-5 years old > Common operating system: RHEL-based > HEP SPEC '06 benchmark is used for accounting in WLCG and by procurement Based on CPU SPEC 2006 all_cpp benchmark set A Working group was established to prepare non proprietary replacement for HEP SPEC '06
  13. 13. HIGH-LUMINOSITY LHC: BEST ESTIMATE 0 100 200 300 400 500 600 700 800 900 1000 Raw Derived Data estimates for 1st year of HL-LHC (PB) ALICE ATLAS CMS LHCb 0 50000 100000 150000 200000 250000 CPU (HS06) CPU Needs for 1st Year of HL-LHC (kHS06) ALICE ATLAS CMS LHCb DATA: RAW 50 PB 2016 -> 600 PB 2027 DERIVED (1 COPY) -> 80 PB 2016 -> 900 PB 2027 CPU: 60x from 2016 At least x10 above what is 
 realistic to expect from technology 
 with reasonably constant cost Technology at ~20%/year will bring x6-10 in 10-11 years We need to move from evolution to revolution in our computing model
  14. 14. MOTIVATION FOR ARM in HEP? > Explore new hardware and software platforms that, in the future, may be more suited to its bulk production workloads, i.e. Simulation to start - Performance benchmarks - POWER CONSUMPTION - Are results consistent? (i.e. validation) > Improves overall code quality > More efficient computing - less energy / computation > Geopolitics plays a role - server farms might be different architectures for various REGIONS (Russia, Asia, etc.) > Business model of ARM is very flexible - Competition, freedom, flexibility > How will it affect our resource estimates for hl-LHC (in 10 years)?
  15. 15. SOFTWARE STACKS open-source @ GITHUB C++14 Python Fortran OS (RHEL/CentOS/SL) Toolchain Standard HEP Python zlib glibc OpenSSL ... GCC Binutils GDB elfutils LLVM/Clang ... ROOT FFTW EIGEN HepMC SciPy ... CMSSW CMS Software Bundle CVMFS CMSSW Firefox SLOCs 6M 7M Initial Release 2005 2002 Contributors >1300 >1200 Memory Footprint ~2GB ~0.3GB Other CERN developed software would increase SLOCs ROOT6 w/o Clang: 1.7M GEANT4: 1.1M The actual application software for "pattern recognition", "simulation", etc. LCG externals AtlasExternals Gaudi AthSimulation ATLAS specific not ATLAS specific ATLAS CODEBASE (Athena) Athena is ~6.5 million lines of code: - ~2400 packages - AthSimulation is a subset of Athena at ~350 packages Full list of lCG (LHC Computing GRID) externals: http://lcgsoft.web.cern.ch/lcgsoft/
  16. 16. Finally successful execution achieved! The first AArch64 based WLCG site (demonstrator) CMS Dashboard Task Monitoring On June 26, 2015 CMS successfully executed CMSSW based job on AArch64 worker node via standard job injection pipeline and received output files back
  17. 17. VALIDATION Simulation is MoNte carlo PROCESS: - NUMERICAL IDENTITY is not expected - but different trends/histogram shapes are clear warning signs!Hits 0 5000 10000 15000 20000 25000 Intel Xeon HP Moonshot Intel Atom Aarch64_Proto SimulationATLAS SCT_x 600− 400− 200− 0 200 400 600 ratio 0.6 0.8 1 1.2 1.4 Reconstructed hits in ATLAS SCT detectors RECONSTRUCTION EXAMPLE FROM CMSSW (x86_64 vs aarch64) The difference between two architectures THE Main question: how significant is this?
  18. 18. SAME, SINGLE Event: SPOT THE DIFFERENCES... Intel Xeon AARch64_pROTO
  19. 19. How many ARM CORES (excl. SMARtPHONES)?<2,000 pHYSICAL cores for AARCH64 in 2017 for porting, benchmarking, optimization, and feedback NOT USED FOR PRODUCTION Not everything will be powered directly by centos 7, but hopefully majority in SOME CASES we use centOS 7 via LINUX containers
  20. 20. CERN openlab What is CERN OPENLAB? "CERN openlab is a unique public-private partnership that accelerates the development of cutting-edge solutions for the worldwide LHC community and wider scientific research. Through CERN openlab, CERN collaborates with leading ICT companies and research institutes." (http://openlab.cern/) .. -> PHASE V (2015-2017)-> PHASE VI (starts in 2018) -> .. Openlab works in Phases and we are currently in 5th phase (ends this yEAR) with focuses on: - data acquisition - networks and connectiviTY - data storage architectures - Compute provisioning and management - computing platforms - data analytics Code Modernization ARMv8 64-bit Porting, Optimization, and Benchmarking Three BROAD areas for R&D for PHASE VI: > Data center technologies and infrastructures - NETWORKS - Cloud computing - storage and databases - data center architectures (disaggregation) > computing platforms and software - architectures - software modernization/acceleration > data analytics and machine learning - physics - engineering (control systems, infrastructure optimization) - great interest from other communities
  21. 21. THE DIRECTION OF COMMUNITY CERN openlab white paper on computing challenges(September 2017) HEP SOFTWARE FOUNDATION, Community white paper (CWP)(summer 2017) A Roadmap for HEP Software and Computing R&D for the 2020s Multiple working groups: Computing Models, Facilities, and Distributed Computing; Detector Simulation; Software Trigger and Event Reconstruction; Visualization; Data Access and Management; Security and Access Control; Machine Learning; Conditions Database; Event Processing Frameworks; Physics Generators; Math Libraries; Software Development, Deployment and Validation/Verification; Data Analysis and Interpretation; Workflow and Resource Management; Data and Software Preservation; Careers, Staffing and Training; Data Acquisition Software; Various Aspects of Technical Evolution (Software Tools, Hardware, Networking); Monitoring (http://hepsoftwarefoundation.org/activities/cwp.html)
  22. 22. Scavenging Cycles Cloud ● Well accounted ● Spot price market ● Elasticity Offload Peaks HPC ● Allocation by grants ● Backfill mode Simulation Bursts Volunteer Computing ● Opportunistic cycles ● Outreach Unmanaged Resources Our applications and systems must adapt!
  23. 23. Cloud Computing in High-Energy Physics Drivers and Obstacles 1 Cost (partially) 2 Control & trust 3 Specialized applications 4 Learn how to build better distributed systems Themes 1 Hybrid academic-commercial clouds 2 Offload mainly simulation (up to 50%), i. e. no data lock-in 3 “Private” adoption of cloud technology ∙ OpenStack for virtualization ∙ Ceph/RADOS as a BLOB store ∙ “Data Mining as a Service”
  24. 24. Cycles in the Cloud Source: Gutsche
  25. 25. Cycles in the Cloud Source: Gutsche
  26. 26. Cloud Resources in a Global Batch System HEP Site Cloud Experiment’s File Catalog Experiment’s Task Queue Site Storage Software Cache WorkersWorkersWorkers register files pull jobs data transfer VM Factory Virtual Machines Agent CloudCloud Gateway Book Keeper WebAPI Micro Virtual Machines Agent CloudCloud Gateway Book Keeper WebAPI Micro Virtual Machines Agent CloudCloud Gateway Book Keeper WebAPI Micro Software Cache monitors starts & stops pull jobs register files write output
  27. 27. OpenStack / KVM CERN OpenStack ∙ 200 k cores (growing) ∙ 3 PB storage ∙ Spans 2 physical data centers Our remote data center is here in Budapest!
  28. 28. A HEP Image for Clouds Twofold system: 𝜇CernVM boot loader + OS delivered by custom network file system (CernVM-FS) initrd: CernVM-FS + 𝜇Contextualisation AUFS Writable Overlay OS + Extras Kernel BootLoader Scratch Disk User Data (EC2, OpenStack, CernVM Online, . . . ) FuseAUFS atlas alice · · · EL 4 EL 5 EL 6 EL 7 20MB ∼ 30 000 CernVMs booted per day
  29. 29. Porting CernVM to ARM initrd: CernVM-FS + 𝜇Contextualisation Kernel BootLoader FuseAUFS 20MB ∙ Re-compile kernel, CernVM-FS etc. for AArch64 ∙ Partition table & MBR −→ GPT and ESP (UEFI compliant) Prototypes on X-Gene 1 and on new Cortex-A57 servers
  30. 30. Porting CernVM to ARM
  31. 31. Porting CernVM to ARM
  32. 32. LHC@Home ∙ Serious computing ∙ Workstations & Gaming PCs ∙ > 2 trillion collisions simulated for CERN theory group (biggest computing resource for this group!) ∙ ATLAS’ second biggest “simulation site”
  33. 33. The Future of Volunteer Computing?
  34. 34. The Future of Volunteer Computing?
  35. 35. Revisiting All Areas of Computing in HEP
  36. 36. Thanks for Your Time! davidlt@cern.ch jblomer@cern.ch joshua.wyatt.smith@cern.ch

×