SlideShare una empresa de Scribd logo
1 de 13
DB 
CERN 
CH-1211 Geneva 23 
Switzerland 
www.cern.ch 
Big Data Analytics 
in High-Energy Physics 
Alexander Loth 
CERN 
23 May 2012
CERN 
CH-1211 Geneva 23 
Switzerland 
www.cern.ch 
CERN 
• CERN is the European 
Organization for Nuclear Research 
• Founded in 1954 by 12 countries 
for fundamental physics 
• Today: the global effort of 
21 member states 
– About 1 billion CHF yearly budget 
– 3300 employees 
• Supporting the research 
activities of ~10000 scientists 
from 110+ nationalities 
Alexander Loth, 23 May 2012
CERN 
CH-1211 Geneva 23 
Switzerland 
www.cern.ch 
Fundamental Research at CERN 
• Why do particles have mass? 
• Why is there no antimatter left 
in the universe? 
• What was the state 
of universe just after 
the Big Bang? 
Alexander Loth, 23 May 2012
CERN 
CH-1211 Geneva 23 
Switzerland 
www.cern.ch 
CERN Accelerator Complex 
Alexander Loth, 23 May 2012
CERN 
CH-1211 Geneva 23 
Switzerland 
www.cern.ch 
Potential of Big Data Analytics 
Stage 4: 
WISDOM 
Decisions 
Stage 3: KNOWLEDGE 
GENERATION 
 Reduce and predict 
faults and corrective 
interventions 
 Increase the availability 
and operations efficiency 
Predictions Reporting Visualization 
PROACTIVE 
PREDICTIVE 
INTELLIGENT 
Stage 2: INFORMATION RETRIEVAL 
Queries Statistics Analysis 
Stage 1: DATA COLLECTION AND STORAGE 
Data Integration Data Merging ETL 
CONTROL AND MONITORING SYSTEMS 
Alexander Loth, 23 May 2012
CERN 
CH-1211 Geneva 23 
Switzerland 
www.cern.ch 
What about Business Intelligence? 
Traditional BI Big Data Analytics 
TBs to EBs of 
data 
External + 
Operational 
Un-/Semi- 
Structured 
Ad hoc 
GBs to TBs of 
data 
Operational 
Structured 
Repetitive 
Alexander Loth, 23 May 2012
CERN 
CH-1211 Geneva 23 
Switzerland 
www.cern.ch 
Challenges of Big Data Analytics 
VOLUME 
Scale of data: in 2011 
humankind created 
1200 EB of information 
VELOCITY 
Analysis of streaming 
data: worldwide digital 
content will double 
every 18 month 
VARIETY 
Different forms of 
data: 80% of data is 
unstructured 
CERN: 22PB/year, 
peaking 20GB/s, 
writing spread across 
80 tape drives 
VERACITY 
Uncertainty of data: 
poor data quality costs 
$3.1 trillions a year 
Sources: The Economist, Gartner, IDC, McKinsey 
Alexander Loth, 23 May 2012
CERN 
CH-1211 Geneva 23 
Switzerland 
www.cern.ch 
Big Data Analytics Use Cases 
Alexander Loth, 23 May 2012
CERN 
CH-1211 Geneva 23 
Switzerland 
www.cern.ch 
Why using Hadoop at CERN? 
• System should manage and heal itself 
– Automatically and transparently 
route around failure 
– Speculatively execute redundant 
tasks if certain nodes are 
detected to be slow 
• Performance should scale linearly 
– Proportional change in capacity 
with resource change 
• Computing should move to data 
– Lower latency, lower bandwidth 
• Simple core that is 
modular and extensible 
Alexander Loth, 23 May 2012
CERN 
CH-1211 Geneva 23 
Switzerland 
www.cern.ch 
Hadoop Clusters at CERN 
• CASTOR Cluster 
with ~10 servers 
– ~100GB of logs per day 
– >100TB of logs in total 
• ATLAS Cluster 
with ~20 servers 
– Event index catalogue 
for experimental data 
in the Grid 
• Monitoring Cluster 
with ~10 servers 
– Log events from 
CERN Computer Cluster 
Alexander Loth, 23 May 2012
CERN 
CH-1211 Geneva 23 
Switzerland 
www.cern.ch 
Meta data from Physics Events (1) 
• Meta data are created upon recording of a physics event 
• Example 1: Event Information 
– Run number, Event number 
– Timestamp 
– Luminosity block number 
– Trigger that selected the event, etc. 
Alexander Loth, 23 May 2012
CERN 
CH-1211 Geneva 23 
Switzerland 
www.cern.ch 
Meta data from Physics Events (2) 
• Meta data are created upon recording of a physics event 
• Example 2: Tape Storage Event Log 
– On which tape is my file stored? 
– Is there a copy on a disk? 
– List me all events for a given tape or drive 
– Was the tape repacked? 
Alexander Loth, 23 May 2012
CERN 
CH-1211 Geneva 23 
Switzerland 
www.cern.ch 
Questions? 
Alexander Loth, 23 May 2012

Más contenido relacionado

Último

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 

Destacado

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destacado (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Big Data Analytics in High-Energy Physics

  • 1. DB CERN CH-1211 Geneva 23 Switzerland www.cern.ch Big Data Analytics in High-Energy Physics Alexander Loth CERN 23 May 2012
  • 2. CERN CH-1211 Geneva 23 Switzerland www.cern.ch CERN • CERN is the European Organization for Nuclear Research • Founded in 1954 by 12 countries for fundamental physics • Today: the global effort of 21 member states – About 1 billion CHF yearly budget – 3300 employees • Supporting the research activities of ~10000 scientists from 110+ nationalities Alexander Loth, 23 May 2012
  • 3. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Fundamental Research at CERN • Why do particles have mass? • Why is there no antimatter left in the universe? • What was the state of universe just after the Big Bang? Alexander Loth, 23 May 2012
  • 4. CERN CH-1211 Geneva 23 Switzerland www.cern.ch CERN Accelerator Complex Alexander Loth, 23 May 2012
  • 5. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Potential of Big Data Analytics Stage 4: WISDOM Decisions Stage 3: KNOWLEDGE GENERATION  Reduce and predict faults and corrective interventions  Increase the availability and operations efficiency Predictions Reporting Visualization PROACTIVE PREDICTIVE INTELLIGENT Stage 2: INFORMATION RETRIEVAL Queries Statistics Analysis Stage 1: DATA COLLECTION AND STORAGE Data Integration Data Merging ETL CONTROL AND MONITORING SYSTEMS Alexander Loth, 23 May 2012
  • 6. CERN CH-1211 Geneva 23 Switzerland www.cern.ch What about Business Intelligence? Traditional BI Big Data Analytics TBs to EBs of data External + Operational Un-/Semi- Structured Ad hoc GBs to TBs of data Operational Structured Repetitive Alexander Loth, 23 May 2012
  • 7. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Challenges of Big Data Analytics VOLUME Scale of data: in 2011 humankind created 1200 EB of information VELOCITY Analysis of streaming data: worldwide digital content will double every 18 month VARIETY Different forms of data: 80% of data is unstructured CERN: 22PB/year, peaking 20GB/s, writing spread across 80 tape drives VERACITY Uncertainty of data: poor data quality costs $3.1 trillions a year Sources: The Economist, Gartner, IDC, McKinsey Alexander Loth, 23 May 2012
  • 8. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Big Data Analytics Use Cases Alexander Loth, 23 May 2012
  • 9. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Why using Hadoop at CERN? • System should manage and heal itself – Automatically and transparently route around failure – Speculatively execute redundant tasks if certain nodes are detected to be slow • Performance should scale linearly – Proportional change in capacity with resource change • Computing should move to data – Lower latency, lower bandwidth • Simple core that is modular and extensible Alexander Loth, 23 May 2012
  • 10. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Hadoop Clusters at CERN • CASTOR Cluster with ~10 servers – ~100GB of logs per day – >100TB of logs in total • ATLAS Cluster with ~20 servers – Event index catalogue for experimental data in the Grid • Monitoring Cluster with ~10 servers – Log events from CERN Computer Cluster Alexander Loth, 23 May 2012
  • 11. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Meta data from Physics Events (1) • Meta data are created upon recording of a physics event • Example 1: Event Information – Run number, Event number – Timestamp – Luminosity block number – Trigger that selected the event, etc. Alexander Loth, 23 May 2012
  • 12. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Meta data from Physics Events (2) • Meta data are created upon recording of a physics event • Example 2: Tape Storage Event Log – On which tape is my file stored? – Is there a copy on a disk? – List me all events for a given tape or drive – Was the tape repacked? Alexander Loth, 23 May 2012
  • 13. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Questions? Alexander Loth, 23 May 2012

Notas del editor

  1. What I’m going to tell you today is how we use Big Data Analytics to improve the operation of the Large Hardron collider. My name is Alexander Loth and have been working for the last 3 years for CERN. Big Data Analytics has a huge impact on how we plan CERN’s overall technology strategy as well as specific strategies for High-Energy Physics analysis.
  2. Just a few words about CERN. CERN is the European Organization for Nuclear Research. It was founded some years after second World War for peaceful research on fundamental physics…
  3. So we do fundamental research: - Why do we have mass? 50 years ago Peter Higgs proposed the Higgs mechanism, which seems to be the answer. - At the first moment of the universe the same amount of matter and antimatter was present. We are clearly matter. So what happened to the antimatter? What were the properties of the universe right after Big Bang? On the photo you can see a part of the LHC. The LHC is the biggest and most complex machine ever built.
  4. The LHC is part of the CERN accelerator complex. The particles start at the booster and gets accelerators until they reach the LHC ring. If you read Dan Brown’s book Angel and Demon’s, you should have a look on the AD (Antiproton Decelerator).
  5. In order to process the huge amount of data gathered by the LHC experiments, we need to apply Big Data Analytics. We want to profit from our data investment and extract the knowledge. This has to be done in a proactive, predictive and intelligent way. Big Data Analytics will safe massive costs, if we can: Reduce and predict faults and corrective interventions Increase the availability and operations efficiency
  6. If you ask yourself how Big Data Analytics differs to Business Intelligence… This brings us to the specific challenges of Big Data Analytics shown on the next slide.
  7. VOLUME / VARIETY / VELOCITY / VERACITY Data is exploding because it is coming from so many sources, continuously. Systems, sensors, and more. But the amount of data isn’t the only issue. There are 4 key issues to overcome if you want to tame big data – volume, variety, velocity and veracity. You have to be able to deal with lots and lots, of all kinds of data, moving really quickly. Today, most of this data is passing you by. You blink and it’s gone. Next year: in total over 88PB stored 55.000 tapes + 14PB stored on disks.
  8. At CERN we are problem driven people. This slide shows the present technologies applied for Big Data Analytics at CERN. As you can see we choose always the technology as it fits best. So on plenty cases we still rely on Oracle and store even tons of raw data on tape. However, more and more use cases pop up to use Hadoop, for instance for monitoring and meta data.
  9. … furthermore the Hadoop Distributed File System (HDFS), also used massively by Facebook is a self-healing, high bandwidth clustered storage. It’s reliable, redundant and optimized for huge amount of data.