SlideShare a Scribd company logo
1 of 39
Accelerating Science
                     with Puppet

                            Tim Bell
                        Tim.Bell@cern.ch
                          @noggin143

                     PuppetConf San Francisco
                       28th September 2012


PuppetConf 2012            Tim Bell, CERN       1
What is CERN ?
• Conseil Européen pour la
  Recherche Nucléaire – aka
  European Laboratory for
  Particle Physics
• Between Geneva and the
  Jura mountains, straddling
  the Swiss-French border
• Founded in 1954 with an
  international treaty
• Our business is fundamental
  physics , what is the
  universe made of and how
  does it work
  PuppetConf 2012               Tim Bell, CERN   2
Answering fundamental questions…
• How to explain particles have mass?
   We have theories and accumulating experimental evidence.. Getting close…

• What is 96% of the universe made of ?
   We can only see 4% of its estimated mass!

• Why isn’t there anti-matter
  in the universe?
   Nature should be symmetric…

• What was the state of matter just
  after the « Big Bang » ?
   Travelling back to the earliest instants of
   the universe would help…

PuppetConf 2012                        Tim Bell, CERN                         3
Community collaboration on an international scale




PuppetConf 2012         Tim Bell, CERN                 4
The Large Hadron Collider




PuppetConf 2012            Tim Bell, CERN     5
PuppetConf 2012   Tim Bell, CERN   6
LHC construction




PuppetConf 2012        Tim Bell, CERN   7
The Large Hadron Collider (LHC) tunnel




PuppetConf 2012     Tim Bell, CERN
                                     8
PuppetConf 2012   Tim Bell, CERN   9
Superconducting magnets – October 2008




    A faulty connection between two superconducting magnets led to the release of a
    large amount of helium into the LHC tunnel and forced the machine to shut down
    for repairs for one year


PuppetConf 2012                      Tim Bell, CERN                               10
Accumulating events in 2009-2011




PuppetConf 2012                Tim Bell, CERN        11
PuppetConf 2012   Tim Bell, CERN   12
Heavy Ion Collisions




PuppetConf 2012          Tim Bell, CERN   13
PuppetConf 2012   Tim Bell, CERN   14
Tier-0 (CERN):
                                                               •Data recording
                                                               •Initial data reconstruction
                                                               •Data distribution



                                                              Tier-1 (11 centres):
                                                              •Permanent storage
                                                              •Re-processing
                                                              •Analysis


                                                              Tier-2 (~200 centres):
                                                              • Simulation
                                                              • End-user analysis


• Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid
• In a normal day, the grid provides 100,000 CPU days executing 1 million jobs
   PuppetConf 2012                     Tim Bell, CERN                                15
•     Data Centre by Numbers
        – Hardware installation & retirement
                  •    ~7,000 hardware movements/year; ~1,800 disk failures/year


          Racks                              828    Disks                                  64,109         Tape Drives                 160
          Servers                         11,728    Raw disk capacity (TiB)                63,289         Tape Cartridges           45,000
          Processors                      15,694    Memory modules                         56,014         Tape slots                56,000
          Cores                           64,238    Memory capacity (TiB)                       158       Tape Capacity (TiB)       73,000
          HEPSpec06                      482,507    RAID controllers                           3,749
                                                                                                          High Speed Routers
                                                                                                                                       24
                       Xeon    Xeon Xeon                        Other Fujitsu                             (640 Mbps → 2.4 Tbps)
                       3GHz    5150 5160 Xeon                    0%    3%
       Xeon             4%      2% 10% E5335
                                                                                                          Ethernet Switches           350
       L5520                              7% Xeon                               Hitachi
        33%                                                                      23%                      10 Gbps ports              2,000
                                            E5345
                                             14%                                                 HP       Switching Capacity      4.8 Tbps
                                                                                     Seagate
                                                                                                 0%
                                                                                      15%
                                                                                                          1 Gbps ports              16,939
                                                                                                Maxtor
                                                    Western                                      0%       10 Gbps ports               558
                                           Xeon
               Xeon                                 Digital
                                           E5405
                               Xeon                  59%
               L5420                        6%                                                         IT Power Consumption       2,456 KW
                8%             E5410
                                16%
                                                                                                       Total Power Consumption    3,890 KW




    PuppetConf 2012                                           Tim Bell, CERN                                                                 16
Our Challenges - Data storage




                                              •   25PB/year to record
                                              •   >20 years retention
                                              •   6GB/s average
                                              •   25GB/s peaks




PuppetConf 2012              Tim Bell, CERN                      17
PuppetConf 2012   Tim Bell, CERN   18
45,000 tapes holding 73PB of physics data




PuppetConf 2012                   Tim Bell, CERN              19
New data centre to expand capacity
                                   • Data centre in Geneva
                                     reaches limit of
                                     electrical capacity at
                                     3.5MW
                                   • New centre chosen in
                                     Budapest, Hungary
                                   • Additional 2.7MW of
                                     usable power
                                   • Hands off facility
                                   • Deploying from 2013

PuppetConf 2012   Tim Bell, CERN                         20
Time to change strategy
• Rationale
      – Need to manage twice the servers as today
      – No increase in staff numbers
      – Tools becoming increasingly brittle and will not scale as-is
• Approach
      – We are no longer a special case for compute
      – Adopt an open source tool chain model
      – Strong engineering skills allows rapid adoption of new technologies
            • Evaluate solutions in the problem domain
            • Identify functional gaps and challenge them
      – Contribute new function back to the community

PuppetConf 2012                        Tim Bell, CERN                         21
Building Blocks
                          mcollective, yum              Bamboo

      Puppet
                                       AIMS/PXE
                                        Foreman                  JIRA

   OpenStack
     Nova


                                                                        git



                                                                 Koji, Mock
                                        Yum repo
 Active Directory /                       Pulp
       LDAP




                                                   Lemon /
        Hardware
                                                   Hadoop
        database
                        Puppet-DB
PuppetConf 2012             Tim Bell, CERN                                22
Training and Support
• Buy the book rather than guru mentoring
• Newcomers are rapidly productive (and often know more than us)
• Community and Enterprise support means we’re not on our own




PuppetConf 2012                 Tim Bell, CERN                     23
Staff Motivation
• Skills valuable outside of CERN when an engineer’s contracts
  end




PuppetConf 2012             Tim Bell, CERN                       24
Prepare the move to the clouds
• Improve operational efficiency
      – Machine reception and testing
      – Hardware interventions with long running programs
      – Multiple operating system demand
• Improve resource efficiency
      – Exploit idle resources, especially waiting for tape I/O
      – Highly variable load such as interactive or build machines
• Improve responsiveness
      – Self-Service
      – Coffee break response time


PuppetConf 2012                     Tim Bell, CERN                   25
Service Model
                                 • Pets are given names like
                                   pussinboots.cern.ch
                                 • They are unique, lovingly hand raised
                                   and cared for
                                 • When they get ill, you nurse them back
                                   to health

                                 • Cattle are given numbers like
                                   vm0042.cern.ch
                                 • They are almost identical to other cattle
                                 • When they get ill, you get another one



          • Future application architectures tend towards Cattle but Pets
            with configuration management are also viable
PuppetConf 2012                       Tim Bell, CERN                           26
OpenStack
• Open source cloud run by an independent foundation
  with over 6,000 members from 850 organisations
• Started in 2010 but maturing rapidly with public cloud
  services from Rackspace, HP and Ubuntu


Platinum Members




 PuppetConf 2012              Tim Bell, CERN               27
Many OpenStack Components to Configure
                                                      HORIZON
                   KEYSTONE




          GLANCE                               NOVA




                              Compute                           Scheduler
     Registry      Image




                                 Volume
                                                                 Network


PuppetConf 2012               Tim Bell, CERN                                28
When communities combine…
• OpenStack’s many components and options make
  configuration complex out of the box
• Puppet forge module from PuppetLabs (Thanks, Dan Bode)
• The Foreman adds OpenStack provisioning for user kiosk




PuppetConf 2012           Tim Bell, CERN                   29
Scaling up with Puppet and OpenStack
• Use LHC@Home based on BOINC for simulating magnetics
  guiding particles around the LHC
• Naturally, there is a puppet module puppet-boinc
• 1000 VMs spun up to stress test the hypervisors with Puppet,
  Foreman and OpenStack




PuppetConf 2012             Tim Bell, CERN                       30
Next Steps
• Expand tool chain
      – Mcollective
      – Puppet-DB
• Deploy at scale in production
      – Move towards 15,000 hypervisors over next two years
      – Extimate 100-300,000 virtual machines
• Work with labs on common solutions for scientific computing
      – Batch system configurations
      – Grids
      – Publishing to http://github.com/cernops
• Investigate desktop and device management
      – Linux desktops
      – Macs
      – KVMs, PDUs
PuppetConf 2012                      Tim Bell, CERN             31
Final Thoughts      • A small project to share documents at
                      CERN in the ‘90s created the massive
                      phenomenon that is today’s world wide
                      web
                       • Open Source
                       • Vibrant community and eco-system
                    • Working with the Puppet and OpenStack
                      communities has shown the power of
                      collaboration
                       • We have built a toolchain in one
                           year with part time resources
                       • Running 15,000 servers and up to
                           300,000 VMs is scary but achievable
                    • Looking forward to further contributions
                      as we move to large scale deployment

PuppetConf 2012   Tim Bell, CERN                          32
For more details, see Ben Jones’ talk at 15:50 today
Configuration Management at CERN – From
Homegrown to Industry Standard




                                                       Tim Bell
References


CERN                                            http://public.web.cern.ch/public/
Scientific Linux                                http://www.scientificlinux.org/
Worldwide LHC Computing Grid                    http://lcg.web.cern.ch/lcg/
                                                http://rtm.hep.ph.ic.ac.uk/
Jobs                                            http://cern.ch/jobs
Detailed Report on Agile Infrastructure         http://cern.ch/go/N8wp




 PuppetConf 2012                          Tim Bell, CERN                            34
Backup Slides




PuppetConf 2012    Tim Bell, CERN   35
CERN’s tools
• The world’s most powerful accelerator: LHC
      –   A 27 km long tunnel filled with high-tech instruments
      –   Equipped with thousands of superconducting magnets
      –   Accelerates particles to energies never before obtained
      –   Produces particle collisions creating microscopic “big bangs”
• Very large sophisticated detectors
      – Four experiments each the size of a cathedral
      – Hundred million measurement channels each
      – Data acquisition systems treating Petabytes per second
• Top level computing to distribute and analyse the data
      – A Computing Grid linking ~200 computer centres around the globe
      – Sufficient computing power and storage to handle 25 Petabytes per
        year, making them available to thousands of physicists for analysis
PuppetConf 2012                      Tim Bell, CERN                           36
Our Infrastructure
• Hardware is generally based on commodity, white-box servers
      – Open tendering process based on SpecInt/CHF, CHF/Watt and GB/CHF
      – Compute nodes typically dual processor, 2GB per core
      – Bulk storage on 24x2TB disk storage-in-a-box with a RAID card
• Vast majority of servers run Scientific Linux, developed by
  Fermilab and CERN, based on Redhat Enterprise
      – Focus is on stability in view of the number of centres on the WLCG




PuppetConf 2012                    Tim Bell, CERN                            37
New architecture data flows




PuppetConf 2012     Tim Bell, CERN   38
OpenStack
Gold Members




 PuppetConf 2012     Tim Bell, CERN   39

More Related Content

More from Puppet

KGI compliance as-code approach
KGI compliance as-code approachKGI compliance as-code approach
KGI compliance as-code approachPuppet
 
Enforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automationEnforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automationPuppet
 
Keynote: Puppet camp compliance
Keynote: Puppet camp complianceKeynote: Puppet camp compliance
Keynote: Puppet camp compliancePuppet
 
Automating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNowAutomating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNowPuppet
 
Puppet: The best way to harden Windows
Puppet: The best way to harden WindowsPuppet: The best way to harden Windows
Puppet: The best way to harden WindowsPuppet
 
Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020Puppet
 
Accelerating azure adoption with puppet
Accelerating azure adoption with puppetAccelerating azure adoption with puppet
Accelerating azure adoption with puppetPuppet
 
Puppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael PinsonPuppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael PinsonPuppet
 
ServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkPuppet
 
Take control of your dev ops dumping ground
Take control of your  dev ops dumping groundTake control of your  dev ops dumping ground
Take control of your dev ops dumping groundPuppet
 
100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy Software100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy SoftwarePuppet
 
Puppet User Group
Puppet User GroupPuppet User Group
Puppet User GroupPuppet
 
Continuous Compliance and DevSecOps
Continuous Compliance and DevSecOpsContinuous Compliance and DevSecOps
Continuous Compliance and DevSecOpsPuppet
 
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick MaludyThe Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick MaludyPuppet
 
ServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkPuppet
 
Puppet in k8s, Miroslav Hadzhiev
Puppet in k8s, Miroslav HadzhievPuppet in k8s, Miroslav Hadzhiev
Puppet in k8s, Miroslav HadzhievPuppet
 
Bolt on Windows - James Pogran
Bolt on Windows - James PogranBolt on Windows - James Pogran
Bolt on Windows - James PogranPuppet
 
The Business Value of Modernizing your Windows Infrastructure and Bringing Li...
The Business Value of Modernizing your Windows Infrastructure and Bringing Li...The Business Value of Modernizing your Windows Infrastructure and Bringing Li...
The Business Value of Modernizing your Windows Infrastructure and Bringing Li...Puppet
 
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020Puppet
 
Navigating the new normal with self healing infrastructure automation
Navigating the new normal with self healing infrastructure automationNavigating the new normal with self healing infrastructure automation
Navigating the new normal with self healing infrastructure automationPuppet
 

More from Puppet (20)

KGI compliance as-code approach
KGI compliance as-code approachKGI compliance as-code approach
KGI compliance as-code approach
 
Enforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automationEnforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automation
 
Keynote: Puppet camp compliance
Keynote: Puppet camp complianceKeynote: Puppet camp compliance
Keynote: Puppet camp compliance
 
Automating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNowAutomating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNow
 
Puppet: The best way to harden Windows
Puppet: The best way to harden WindowsPuppet: The best way to harden Windows
Puppet: The best way to harden Windows
 
Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020
 
Accelerating azure adoption with puppet
Accelerating azure adoption with puppetAccelerating azure adoption with puppet
Accelerating azure adoption with puppet
 
Puppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael PinsonPuppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael Pinson
 
ServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin Reeuwijk
 
Take control of your dev ops dumping ground
Take control of your  dev ops dumping groundTake control of your  dev ops dumping ground
Take control of your dev ops dumping ground
 
100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy Software100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy Software
 
Puppet User Group
Puppet User GroupPuppet User Group
Puppet User Group
 
Continuous Compliance and DevSecOps
Continuous Compliance and DevSecOpsContinuous Compliance and DevSecOps
Continuous Compliance and DevSecOps
 
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick MaludyThe Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
 
ServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin Reeuwijk
 
Puppet in k8s, Miroslav Hadzhiev
Puppet in k8s, Miroslav HadzhievPuppet in k8s, Miroslav Hadzhiev
Puppet in k8s, Miroslav Hadzhiev
 
Bolt on Windows - James Pogran
Bolt on Windows - James PogranBolt on Windows - James Pogran
Bolt on Windows - James Pogran
 
The Business Value of Modernizing your Windows Infrastructure and Bringing Li...
The Business Value of Modernizing your Windows Infrastructure and Bringing Li...The Business Value of Modernizing your Windows Infrastructure and Bringing Li...
The Business Value of Modernizing your Windows Infrastructure and Bringing Li...
 
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020
 
Navigating the new normal with self healing infrastructure automation
Navigating the new normal with self healing infrastructure automationNavigating the new normal with self healing infrastructure automation
Navigating the new normal with self healing infrastructure automation
 

Recently uploaded

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Recently uploaded (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

CERN: Accelerating Science with Puppet - PuppetConf '12

  • 1. Accelerating Science with Puppet Tim Bell Tim.Bell@cern.ch @noggin143 PuppetConf San Francisco 28th September 2012 PuppetConf 2012 Tim Bell, CERN 1
  • 2. What is CERN ? • Conseil Européen pour la Recherche Nucléaire – aka European Laboratory for Particle Physics • Between Geneva and the Jura mountains, straddling the Swiss-French border • Founded in 1954 with an international treaty • Our business is fundamental physics , what is the universe made of and how does it work PuppetConf 2012 Tim Bell, CERN 2
  • 3. Answering fundamental questions… • How to explain particles have mass? We have theories and accumulating experimental evidence.. Getting close… • What is 96% of the universe made of ? We can only see 4% of its estimated mass! • Why isn’t there anti-matter in the universe? Nature should be symmetric… • What was the state of matter just after the « Big Bang » ? Travelling back to the earliest instants of the universe would help… PuppetConf 2012 Tim Bell, CERN 3
  • 4. Community collaboration on an international scale PuppetConf 2012 Tim Bell, CERN 4
  • 5. The Large Hadron Collider PuppetConf 2012 Tim Bell, CERN 5
  • 6. PuppetConf 2012 Tim Bell, CERN 6
  • 8. The Large Hadron Collider (LHC) tunnel PuppetConf 2012 Tim Bell, CERN 8
  • 9. PuppetConf 2012 Tim Bell, CERN 9
  • 10. Superconducting magnets – October 2008 A faulty connection between two superconducting magnets led to the release of a large amount of helium into the LHC tunnel and forced the machine to shut down for repairs for one year PuppetConf 2012 Tim Bell, CERN 10
  • 11. Accumulating events in 2009-2011 PuppetConf 2012 Tim Bell, CERN 11
  • 12. PuppetConf 2012 Tim Bell, CERN 12
  • 13. Heavy Ion Collisions PuppetConf 2012 Tim Bell, CERN 13
  • 14. PuppetConf 2012 Tim Bell, CERN 14
  • 15. Tier-0 (CERN): •Data recording •Initial data reconstruction •Data distribution Tier-1 (11 centres): •Permanent storage •Re-processing •Analysis Tier-2 (~200 centres): • Simulation • End-user analysis • Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid • In a normal day, the grid provides 100,000 CPU days executing 1 million jobs PuppetConf 2012 Tim Bell, CERN 15
  • 16. Data Centre by Numbers – Hardware installation & retirement • ~7,000 hardware movements/year; ~1,800 disk failures/year Racks 828 Disks 64,109 Tape Drives 160 Servers 11,728 Raw disk capacity (TiB) 63,289 Tape Cartridges 45,000 Processors 15,694 Memory modules 56,014 Tape slots 56,000 Cores 64,238 Memory capacity (TiB) 158 Tape Capacity (TiB) 73,000 HEPSpec06 482,507 RAID controllers 3,749 High Speed Routers 24 Xeon Xeon Xeon Other Fujitsu (640 Mbps → 2.4 Tbps) 3GHz 5150 5160 Xeon 0% 3% Xeon 4% 2% 10% E5335 Ethernet Switches 350 L5520 7% Xeon Hitachi 33% 23% 10 Gbps ports 2,000 E5345 14% HP Switching Capacity 4.8 Tbps Seagate 0% 15% 1 Gbps ports 16,939 Maxtor Western 0% 10 Gbps ports 558 Xeon Xeon Digital E5405 Xeon 59% L5420 6% IT Power Consumption 2,456 KW 8% E5410 16% Total Power Consumption 3,890 KW PuppetConf 2012 Tim Bell, CERN 16
  • 17. Our Challenges - Data storage • 25PB/year to record • >20 years retention • 6GB/s average • 25GB/s peaks PuppetConf 2012 Tim Bell, CERN 17
  • 18. PuppetConf 2012 Tim Bell, CERN 18
  • 19. 45,000 tapes holding 73PB of physics data PuppetConf 2012 Tim Bell, CERN 19
  • 20. New data centre to expand capacity • Data centre in Geneva reaches limit of electrical capacity at 3.5MW • New centre chosen in Budapest, Hungary • Additional 2.7MW of usable power • Hands off facility • Deploying from 2013 PuppetConf 2012 Tim Bell, CERN 20
  • 21. Time to change strategy • Rationale – Need to manage twice the servers as today – No increase in staff numbers – Tools becoming increasingly brittle and will not scale as-is • Approach – We are no longer a special case for compute – Adopt an open source tool chain model – Strong engineering skills allows rapid adoption of new technologies • Evaluate solutions in the problem domain • Identify functional gaps and challenge them – Contribute new function back to the community PuppetConf 2012 Tim Bell, CERN 21
  • 22. Building Blocks mcollective, yum Bamboo Puppet AIMS/PXE Foreman JIRA OpenStack Nova git Koji, Mock Yum repo Active Directory / Pulp LDAP Lemon / Hardware Hadoop database Puppet-DB PuppetConf 2012 Tim Bell, CERN 22
  • 23. Training and Support • Buy the book rather than guru mentoring • Newcomers are rapidly productive (and often know more than us) • Community and Enterprise support means we’re not on our own PuppetConf 2012 Tim Bell, CERN 23
  • 24. Staff Motivation • Skills valuable outside of CERN when an engineer’s contracts end PuppetConf 2012 Tim Bell, CERN 24
  • 25. Prepare the move to the clouds • Improve operational efficiency – Machine reception and testing – Hardware interventions with long running programs – Multiple operating system demand • Improve resource efficiency – Exploit idle resources, especially waiting for tape I/O – Highly variable load such as interactive or build machines • Improve responsiveness – Self-Service – Coffee break response time PuppetConf 2012 Tim Bell, CERN 25
  • 26. Service Model • Pets are given names like pussinboots.cern.ch • They are unique, lovingly hand raised and cared for • When they get ill, you nurse them back to health • Cattle are given numbers like vm0042.cern.ch • They are almost identical to other cattle • When they get ill, you get another one • Future application architectures tend towards Cattle but Pets with configuration management are also viable PuppetConf 2012 Tim Bell, CERN 26
  • 27. OpenStack • Open source cloud run by an independent foundation with over 6,000 members from 850 organisations • Started in 2010 but maturing rapidly with public cloud services from Rackspace, HP and Ubuntu Platinum Members PuppetConf 2012 Tim Bell, CERN 27
  • 28. Many OpenStack Components to Configure HORIZON KEYSTONE GLANCE NOVA Compute Scheduler Registry Image Volume Network PuppetConf 2012 Tim Bell, CERN 28
  • 29. When communities combine… • OpenStack’s many components and options make configuration complex out of the box • Puppet forge module from PuppetLabs (Thanks, Dan Bode) • The Foreman adds OpenStack provisioning for user kiosk PuppetConf 2012 Tim Bell, CERN 29
  • 30. Scaling up with Puppet and OpenStack • Use LHC@Home based on BOINC for simulating magnetics guiding particles around the LHC • Naturally, there is a puppet module puppet-boinc • 1000 VMs spun up to stress test the hypervisors with Puppet, Foreman and OpenStack PuppetConf 2012 Tim Bell, CERN 30
  • 31. Next Steps • Expand tool chain – Mcollective – Puppet-DB • Deploy at scale in production – Move towards 15,000 hypervisors over next two years – Extimate 100-300,000 virtual machines • Work with labs on common solutions for scientific computing – Batch system configurations – Grids – Publishing to http://github.com/cernops • Investigate desktop and device management – Linux desktops – Macs – KVMs, PDUs PuppetConf 2012 Tim Bell, CERN 31
  • 32. Final Thoughts • A small project to share documents at CERN in the ‘90s created the massive phenomenon that is today’s world wide web • Open Source • Vibrant community and eco-system • Working with the Puppet and OpenStack communities has shown the power of collaboration • We have built a toolchain in one year with part time resources • Running 15,000 servers and up to 300,000 VMs is scary but achievable • Looking forward to further contributions as we move to large scale deployment PuppetConf 2012 Tim Bell, CERN 32
  • 33. For more details, see Ben Jones’ talk at 15:50 today Configuration Management at CERN – From Homegrown to Industry Standard Tim Bell
  • 34. References CERN http://public.web.cern.ch/public/ Scientific Linux http://www.scientificlinux.org/ Worldwide LHC Computing Grid http://lcg.web.cern.ch/lcg/ http://rtm.hep.ph.ic.ac.uk/ Jobs http://cern.ch/jobs Detailed Report on Agile Infrastructure http://cern.ch/go/N8wp PuppetConf 2012 Tim Bell, CERN 34
  • 35. Backup Slides PuppetConf 2012 Tim Bell, CERN 35
  • 36. CERN’s tools • The world’s most powerful accelerator: LHC – A 27 km long tunnel filled with high-tech instruments – Equipped with thousands of superconducting magnets – Accelerates particles to energies never before obtained – Produces particle collisions creating microscopic “big bangs” • Very large sophisticated detectors – Four experiments each the size of a cathedral – Hundred million measurement channels each – Data acquisition systems treating Petabytes per second • Top level computing to distribute and analyse the data – A Computing Grid linking ~200 computer centres around the globe – Sufficient computing power and storage to handle 25 Petabytes per year, making them available to thousands of physicists for analysis PuppetConf 2012 Tim Bell, CERN 36
  • 37. Our Infrastructure • Hardware is generally based on commodity, white-box servers – Open tendering process based on SpecInt/CHF, CHF/Watt and GB/CHF – Compute nodes typically dual processor, 2GB per core – Bulk storage on 24x2TB disk storage-in-a-box with a RAID card • Vast majority of servers run Scientific Linux, developed by Fermilab and CERN, based on Redhat Enterprise – Focus is on stability in view of the number of centres on the WLCG PuppetConf 2012 Tim Bell, CERN 37
  • 38. New architecture data flows PuppetConf 2012 Tim Bell, CERN 38
  • 39. OpenStack Gold Members PuppetConf 2012 Tim Bell, CERN 39

Editor's Notes

  1. Established by an international treaty at the end of 2nd world war as a place where scientists could work together for fundamental researchNuclear is part of the name but our world is particle physics
  2. Our current understanding of the universe is incomplete. A theory, called the Standard Model, proposes particles and forces, many of which have been experimentally observed. However, there are open questions- Why do some particles have mass and others not ? The Higgs Boson is a theory but we need experimental evidence.Our theory of forces does not explain how Gravity worksCosmologists can only find 4% of the matter in the universe, we have lost the other 96%We should have 50% matter, 50% anti-matter… why is there an asymmetry (although it is a good thing that there is since the two anhialiate each other) ?When we go back through time 13 billion years towards the big bang, we move back through planets, stars, atoms, protons/electrons towards a soup like quark gluon plasma. What were the properties of this?
  3. Biggest international scientific collaboration in the world, over 10,000 scientistsfrom 100 countriesAnnual Budget around 1.1 billion USDFunding for CERN, the laboratory, itselfcomesfrom the 20 member states, in ratio to the grossdomesticproduct… other countries contribute to experimentsincludingsubstantial US contribution towards the LHC experiments
  4. The LHC is CERN’s largest accelerator. A 17 mile ring 100 meters underground where two beams of particles are sent in opposite directions and collided at the 4 experiments, Atlas, CMS, LHCb and ALICE. Lake Geneva and the airport are visible in the top to give a scale.
  5. CERN is more than just the LHCCNGS neutrinos to Gran SassoCLOUD demonstrating impacts of cosmic rays on weather patternsAnti-hydrogen atoms contained for minutes in a magnetic vesselHowever, for those of you who have read Dan Brown’s Angels and Demons or seen the film, there are no maniacal monks with pounds of anti-matter running around the campus
  6. LHC was conceived in the 1980s and construction was started in 2002 within the tunnel of a previous accelerator called LEP6,000 magnets lowered down 100m shafts weighing up to 35 tons each
  7. The ring consists of two beam pipes, with a vacuum pressure 10 times lower than on the moon which contain the beams of protons accelerated to just below the speed of light. These go round 11,000 times per second being bent by the superconducting magnets cooled to 2K by liquid helium (-450F), colder than outer space. The beams themselves have a total energy similar to a high speed train so care needs to be taken to make sure they turn the corners correctly and don’t bump into the walls of the pipe.
  8. - At 4 points around the ring, the beams are made to cross at points where detectors, the size of cathedrals and weighing up to 12,500 tonnes surround the pipe. These are like digital camera, but they take 100 mega pixel photos 40 million times a second. This produces up to 1 petabyte/s.
  9. - Collisions can be visualised by the tracks left in the various parts of the detectors. With many collisions, the statistics allows particle identification such as mass and charge. This is a simple one…
  10. To improve the statistics, we send round beams of multiple bunches, as they cross there are multiple collisions as 100 billion protons per bunch pass through each otherSoftware close by the detector and later offline in the computer centre then has to examine the tracks to understand the particles involved
  11. To get Quark Gluon plasma, the material closest to the big bang, we also collide lead ions which is much more intensive… the temperatures reach 100,000 times that in the sun.
  12. - We cannot record 1PB/s so there are hardware filters to remove uninteresting collisions such as those whose physics we understand already. The data is then sent to the CERN computer centre for recording via 10Gbit optical connections.
  13. The Worldwide LHC Computing grid is used to record and analyse this data. The grid currently runs around 1 million jobs/day, less than 10% of the work is done at CERN. There is an agreed set of protocols for running jobs, data distribution and accounting between all the sites which co-operate in order to support the physicists across the globe.
  14. So, to the Tier-0 computer centre at CERN… we are unusual in that we are public with our environment as there is no competitive advantage for us. We have thousands of visitors a year coming for tours and education and the computer center is a popular visit.The data centre has around 2.9MW of usable power looking after 12,000 servers.. In comparison, the accelerator uses 120MW, like a small town.With 64,000 disks, we have around 1,800 failing each year… this is much higher than the manufacturers’ MTBFs which is consistent with results from Google.Servers are mainly Intel processors, some AMD with dual core Xeon being the most common configuration.
  15. Our data storage system has to record and preserve 25PB/year with an expected lifetime of 20 years. Keeping the old data is required to get the maximum statistics for discoveries. At times, physicists will want to skim this data looking for new physics. Data rates are around 6GB/s average, with peaks of 25GB/s.
  16. Upstairs in the computer centre, a high roof was the fashion in the 1980s for mainframes but now is very difficult to get cooled efficiently
  17. Tape robots from IBM and OracleAround 60,000 tape mounts / week so the robots are kept busyData copied every two years to keep up with the latest media densities
  18. Asked member states for offers200Gbit/s links connecting the centresExpect to double computing capacity compared to today by 2015
  19. Double the capacity, same manpowerNeed to rethink how to solve the problem… look at how others approach itWe had our own tools in 2002 and as they become more sophisticated, it was not possible to take advantage of other developments elsewhere without a major break.Doing this while doing their ‘day’ jobs so it re-enforces the approach of taking what we can from the community
  20. Model based on Google Toolchain, Puppet is key for many operations. We’ve only had to write one new significant custom CERN software component which is in the certificate authority. Other parts such as Lemon for monitoring are from our previous implementation as we did not want to change all at once and they scale.
  21. We’ve been very pleased with our choice of Puppet. Along with the obvious benefits of the functionality, there are soft benefits from the community model.
  22. Many staff at CERN are short term contracts… good benefits for those staff to leave with puppet skills.Quattor is basically flat … did not register.
  23. Puppet applies well to the cattle model but we’re also using it to handle the pet cases that can’t yet move over due to software limitations. So, they get cloud provisioning but flexible configuration management.
  24. More presentations mentioning OpenStack than not ?
  25. Complex to configure… take advantage of the experience of others
  26. Communities integrating … when a new option is being used at CERN in OpenStack, we contribute the changes back to the puppet forge such as certificate handling. Even looking at Hyper-V/Windows openstack configuration…
  27. LHC@Home is not an instruction on how to build your own accelerator but a magnet simulation tool to test multiple passes around the ring. We wanted to use it as a stress test tool and in ½ day, it was running on 1000 VMs.
  28. Many areas going forward … Ben will cover lots of them in the deep dive but easing grid software configuration with Brookhaven labs along with managing non-service environments from desktop Macs/Linux to PDUs in the computer centre.
  29. The project’s success comes down to community. A vibrant community has momentum of its own. As the WWW showed, many contributors can change how we see the world.Looking forward, as we help improve Puppet, remember that you will also be helping achieve a clearer understanding of the universe and how it works.
  30. We purchase on an annuak cycle, replacing around ¼ of the servers. This purchasing is based on performance metrics such as cost per SpecInt or cost/GBGenerally, we are seeing dual core computer servers with Intel or AMD processors and bulk storage servers with 24 or 36 2TB disksThe operating system is Redhatlinux based distributon called Scientific Linux. We share the development and maintenance with Fermilab in Chicago. The choice of a Redhat based distribution comes from the need for stability across the grid, where keeping the 200 centres running compatible Linux distributions.