Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Learning do discover: machine learning in high-energy physics
1. B. Kégl / AppStat@LAL Learning to discover
LEARNING TO DISCOVER:
MACHINE LEARNING IN HIGH-ENERGY PHYSICS
Linear Accelerator Laboratory and Computer Science Laboratory
CNRS/IN2P3 & University Paris-S{ud,aclay}
BALÁZS KÉGL
CERN, May 13, 2014
1
2. B. Kégl / AppStat@LAL Learning to discover
OUTLINE
• What is machine learning?
• Three projects to illustrate data science in HEP
• budgeted learning for triggers (LHCb)
• classification for discovery and the HiggsML challenge (ATLAS)
• deep learning for imaging calorimeters (ILC)
• Concluding remarks
• interdisciplinarity: HEP, ML, data science
2
3. B. Kégl / AppStat@LAL Learning to discover
WHAT IS MACHINE LEARNING?
• “The science of getting computers to act without being
explicitly programmed” - Andrew Ng (Stanford/Coursera)
• part of standard computer science curriculum since the 90s
• inferring knowledge from data
• generalizing to unseen data
• usually no parametric
model assumptions
• emphasizing the computational
challenges
Machine
Learning
Statistics
Optimization
Artificial
intelligence
Neuroscience
Cognitive
science
Signal
processing
Information
theory
Statistical
physics
3
4. B. Kégl / AppStat@LAL Learning to discover
MACHINE LEARNING TAXONOMY
• Supervised learning: non-parametric (model-free) input - output
functions
• classification (Trees, BDT, SVM, NN) - what you call MVA
• regression (Trees, NN, Gaussian Processes)
• Unsupervised learning: non-parametric data representation
• clustering (k-means, spectral clustering, Dirichlet processes)
• dimensionality reduction (PCA, ISOMAP, LLE, auto-associative NN)
• density estimation (kernel density, Gaussian mixtures, the Boltzmann machine)
• Reinforcement learning:
• learning + dynamic control: learn to behave in an environment to maximize cumulative
reward
4
5. B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION
Character recognition
5
6. B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION
Emotion recognition
6
7. B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION
Speech recognition
7
8. B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION
• Input: a usually high dimensional vector x
• Output: a category (label, class) y
• Usually no parametric model
• the classification function y = g(x) is learned using a training set
D = {(x1,y1), . . . , (xn,yn)}
• Well-tested algorithms:
• neural networks, support vector machines, boosting
8
9. B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION
The only goal is a low probability of error
P(g(x) = y)
on previously unseen examples (x, y)
9
10. B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Real time face detection
10
11. B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Real time web page ranking
11
12. B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Real time ad placement
12
13. B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Real time signal/background separation
13
14. B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
The second goal is the fast execution of
g(x)
14
15. B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Trade-off between quality and speed
15
16. B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
• Time constraints
• Memory constraints
• Consumption constraints
• Communication constraints
16
17. B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Learning deep decision DAGs in a
{djalel.benbouzid,busarobi,balazs.kegl}@
Original motivation
Before...
•
Stage 1
Stage 2
Stage 3
Stage 4
The common design:
cascade classification = trigger with levels
easy background
medium background
hard background
signal/very hard backgroundViola-Jones CVPR 2001 17
18. B. Kégl / AppStat@LAL Learning to discover
THE LHCB TRIGGER
• Collaboration with
• Vava Gligorov (CERN)
• Mike Williams (MIT)
• Djalel Benbouzid (LAL)
18
19. B. Kégl / AppStat@LAL Learning to discover
THE LHCB TRIGGER
• A beautifully complex problem
• varying feature costs
• cost may depend on the value
• events are bags of overlapping candidates
19
20. B. Kégl / AppStat@LAL Learning to discover
THE LHCB TRIGGER
Immediate cost
Bag-dependent cost
Value-dependent cost
D0_VTX_FD
PiS_IP
D0C_1_IP
D0C_2_IP
D0C_2_PTD0C_1_PT PiS_PT
D0C_2_IPC
D0C_2_TFC
D0C_1_IPC
D0C_1_TFC
PiS_IPC
PiS_TFCDstMD0MD0Tau
20
23. B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
Easy background Benbouzid et al. ICML 201223
24. B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
EVALUATE
Background-like Signal-like
24
25. B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
SKIP
Background-like Signal-like
25
26. B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPCEVALUATE
Background-like Signal-like
26
27. B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
EVALUATE
Background-like Signal-like
27
28. B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
EVALUATE
Background-like Signal-like
28
29. B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
EVALUATE
Background-like Signal-like
29
30. B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
SKIP
Background-like Signal-like
30
31. B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
EVALUATE
Background-like Signal-like
31
32. B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
SKIP
Background-like Signal-like
32
33. B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
SKIP
SKIP
SKIP
SKIP
Background-like Signal-like
33
34. B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Easy signal Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
EVALUATE
Background-like Signal-like
34
35. B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Easy signal Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
QUIT
Background-like Signal-like
35
36. B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Easy signal Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
36
37. B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard signal Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
EVAL
EVAL EVAL
EVAL
EVAL EVAL
EVAL EVAL
SKIP
SKIP
SKIP
SKIP
SKIP
37
38. B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
• Classification with test-time constraints
• An active research area due to IT applications
• To be exploited for trigger design
38
39. B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
Challenge 1: estimation la plus précise et en un temps CPU minimal de
la masse du candidat boson de Higgs en fonction des observables de
l’événement, et malgré les particules non mesurées. Précision actuelle
(intégration avec chaine de Markov en dimension 5) ~20% en 0.1s par
événement
The HiggsML challenge
39
40. B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
• In a nutshell
• A vector x of variables is extracted from each event
• A classifier g(x) is trained to separate signal from background
• The background b is estimated in the selection region
G = {x : g(x) = s}
• Discovery is made when the number of real events n is significantly
higher than b
40
41. B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
• Exciting physics
• The Higgs to tau-tau excess is not yet at five sigma
Tech. Rep.ATLAS-CONF-2013-108
• Exciting data science (statistics and machine learning)
• What is the theoretical relationship between classification and test
sensitivity?
• What is the quantitative criteria to optimize?
• How to formally include systematic uncertainties?
• Can we redesign classical algorithms (boosting, SVM, neural nets) for
optimizing this criteria? 41
42. B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
We are organizing a
data challenge
to answer some of these
questions
Center for Data Science
Paris-Saclay
the HiggsML challenge
May to September 2014
When High Energy Physics meets Machine Learning
Joerg Stelzer - Atlas-CERN
Marc Schoenauer - INRIA
Balázs Kégl - Appstat-LAL
Cécile Germain - TAO-LRI
David Rousseau - Atlas-LAL
Glen Cowan - Atlas-RHUL
Isabelle Guyon - Chalearn
Claire Adam-Bourdarios - Atlas-LAL
Thorsten Wengler - Atlas-CERN
Andreas Hoecker - Atlas-CERN
Organization committee Advisory committee
info to participate and compete : https://www.kaggle.com/c/higgs-boson
42
43. B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
The formal setup
• We simulate data: D = (x1, y1, w1), . . . , (xn, yn, wn)
• xi 2 Rd
is the feature vector
• yi 2 {background, signal} is the label
• wi 2 R+
is a non-negative weight (importance sampling)
• let S = {i : yi = s} and B = {i : yi = b} be the index sets of signal
and background events, respectively
• Maximize the Approximate Median Significance
G. Cowan, K. Cranmer, E. Gross, and O. Vitells. EPJ C, 71:1554, 2011.
AMS =
r
2
⇣
(s + b) ln
⇣
1 +
s
b
⌘
s
⌘
⇡
s
p
b
• bG = i : g(xi) = s
• s =
P
i2S bG wi
• b =
P
i2B bG wi 43
44. B. Kégl / AppStat@LAL Learning to discover
How to design g to maximize the sensitivity?
• A two-stage approach
1. optimize a discriminant (score) function f : Rd
! R using a classical
learning algorithm (BDT, NN)
CLASSIFICATION FOR DISCOVERY
Signal
Background
44
45. B. Kégl / AppStat@LAL Learning to discover
How to design g to maximize the sensitivity?
• A two-stage approach
1. optimize a discriminant (score) function f : Rd
! R using a classical
learning algorithm (BDT, NN)
CLASSIFICATION FOR DISCOVERY
Signal
Background
45
46. B. Kégl / AppStat@LAL Learning to discover
How to design g to maximize the sensitivity?
• A two-stage approach
1. optimize a discriminant (score) function f : Rd
! R using a classical
learning algorithm (BDT, NN)
CLASSIFICATION FOR DISCOVERY
Signal
Background
46
47. B. Kégl / AppStat@LAL Learning to discover
How to design g to maximize the sensitivity?
• A two-stage approach (make figure with score)
1. optimize a discriminant (score) function f : Rd
! R using a classical
learning algorithm (BDT, NN)
2. define g(x) = sign f(x) ✓ and optimize ✓ for maximizing the AMS
CLASSIFICATION FOR DISCOVERY
θ
3.5σ
47
48. B. Kégl / AppStat@LAL Learning to discover
Comparing with Atlas analysis
• Atlas does a manual pre-selection (category), the first maximum of
the AMS is completely eliminated. Why?
CLASSIFICATION FOR DISCOVERY
s = 250
b = 5000 ± 500!
Systematics!
48
49. B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
How to handle systematic (model) uncertainties?
• OK, so let’s design an objective function that can take background
systematics into consideration
• Likelihood with unknown background b ⇠ N(µb, b)
L(µs, µb) = P(n, b|µs, µb, b) =
(µs + µb)n
n!
e (µs+µb) 1
p
2⇡ b
e (b µb)2
/2 b
2
• Profile likelihood ratio (0) =
L(0, ˆˆµb)
L(ˆµs, ˆµb)
• The new Approximate Median Significance (by Glen Cowan)
AMS =
s
2
✓
(s + b) ln
s + b
b0
s b + b0
◆
+
(b b0)2
b
2
where
b0 =
1
2
⇣
b b
2
+
p
(b b
2)2 + 4(s + b) b
2
⌘
49
50. B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
How to handle systematic (model) uncertainties?
• The new Approximate Median Significance
AMS =
s
2
✓
(s + b) ln
s + b
b0
s b + b0
◆
+
(b b0)2
b
2
where
b0 =
1
2
⇣
b b
2
+
p
(b b
2)2 + 4(s + b) b
2
⌘
New AMS
ATLAS
Old AMS
50
51. B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
Center for Data Science
Paris-Saclay
the HiggsML challenge
May to September 2014
When High Energy Physics meets Machine Learning
Joerg Stelzer - Atlas-CERN
Marc Schoenauer - INRIA
Balázs Kégl - Appstat-LAL
Cécile Germain - TAO-LRI
David Rousseau - Atlas-LAL
Glen Cowan - Atlas-RHUL
Isabelle Guyon - Chalearn
Claire Adam-Bourdarios - Atlas-LAL
Thorsten Wengler - Atlas-CERN
Andreas Hoecker - Atlas-CERN
Organization committee Advisory committee
info to participate and compete : https://www.kaggle.com/c/higgs-boson
A tool for getting
the ML community excited
about your problem
OPEN
since yesterday
51
52. B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
Center for Data Science
Paris-Saclay
the HiggsML challenge
May to September 2014
When High Energy Physics meets Machine Learning
Joerg Stelzer - Atlas-CERN
Marc Schoenauer - INRIA
Balázs Kégl - Appstat-LAL
Cécile Germain - TAO-LRI
David Rousseau - Atlas-LAL
Glen Cowan - Atlas-RHUL
Isabelle Guyon - Chalearn
Claire Adam-Bourdarios - Atlas-LAL
Thorsten Wengler - Atlas-CERN
Andreas Hoecker - Atlas-CERN
Organization committee Advisory committee
info to participate and compete : https://www.kaggle.com/c/higgs-boson
• Organizing committee
• David Rousseau (ATLAS / LAL)
• Balázs Kégl (AppStat / LAL)
• Cécile Germain (LRI / UPSud)
• Glen Cowan (ATLAS / Royal Holloway)
• Claire Adam Bourdarios (ATLAS / LAL)
• Isabelle Guyon (ChaLearn)
52
53. B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
Center for Data Science
Paris-Saclay
the HiggsML challenge
May to September 2014
When High Energy Physics meets Machine Learning
Joerg Stelzer - Atlas-CERN
Marc Schoenauer - INRIA
Balázs Kégl - Appstat-LAL
Cécile Germain - TAO-LRI
David Rousseau - Atlas-LAL
Glen Cowan - Atlas-RHUL
Isabelle Guyon - Chalearn
Claire Adam-Bourdarios - Atlas-LAL
Thorsten Wengler - Atlas-CERN
Andreas Hoecker - Atlas-CERN
Organization committee Advisory committee
info to participate and compete : https://www.kaggle.com/c/higgs-boson
• Official ATLAS GEANT4 simulations
• 30 features (variables)
• 250K training: input, label, weight
• 100K public test (AMS displayed real-time),
only input
• 450K private test (to determine the winner
after the closing of the challenge), only input
• public and private tests set are shuffled,
participants submit a vector of 550K labels
• Using the “old” AMS
• cannot compare participants
if the metric varies a lot
53
54. B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
Center for Data Science
Paris-Saclay
the HiggsML challenge
May to September 2014
When High Energy Physics meets Machine Learning
Joerg Stelzer - Atlas-CERN
Marc Schoenauer - INRIA
Balázs Kégl - Appstat-LAL
Cécile Germain - TAO-LRI
David Rousseau - Atlas-LAL
Glen Cowan - Atlas-RHUL
Isabelle Guyon - Chalearn
Claire Adam-Bourdarios - Atlas-LAL
Thorsten Wengler - Atlas-CERN
Andreas Hoecker - Atlas-CERN
Organization committee Advisory committee
info to participate and compete : https://www.kaggle.com/c/higgs-boson
• 16K$ prize pool
• 7-4-2K$ for the three top
participants
• HEP meets ML award for the most
useful model, decided by the ATLAS
members of the organizing
committee
54
55. B. Kégl / AppStat@LAL Learning to discover
LEADERBOARD AS OF THIS MORNING
55
56. B. Kégl / AppStat@LAL Learning to discover
LEARNING FROM SCRATCH:
THE DEEP LEARNING REVOLUTION
Why don’t we train a neural network on the
raw ~105-108 dimensional signal of ATLAS?
56
57. B. Kégl / AppStat@LAL Learning to discover
LEARNING FROM SCRATCH:
THE DEEP LEARNING REVOLUTION
• Because it is notoriously difficult to automatically build a model
= learn particle physics just by looking at the event browser
• Again, you are not alone: it is also notoriously difficult to
automatically build the model of natural scenes
= learn a model of our surroundings by just looking at
images
• In the last 5-10 years, we are getting close
• May be interesting if you do not know what you are looking
for
57
58. B. Kégl / AppStat@LAL Learning to discover
WHAT IS THIS?
0 100 200 300 400 500 600 700 800
0
5
10
15
20
25
30
35
40
t @nsD
PE
58
59. B. Kégl / AppStat@LAL Learning to discover
WHAT IS THIS?
!"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/
12(#"(*-34*5#+"-$'.*/,-60"/*'$*(,0*2'7*8.#&
9-:;&0<*#$+*':;"0//'=0
>$0&#/('.*"0#.('-$*'$*2'7*8.#&
>$(0"#.('-$
>$'('#&*?'-$
@%(A-'$A*
B"#A:0$(/
2.#((0"0+*?'-$
8C0.(0+*D%.&0-$
2':;&0*E%(*$'.0
2,-"(*("%$.#(0+*/,-60"/
59
60. B. Kégl / AppStat@LAL Learning to discover
WHAT IS THIS?
60
61. B. Kégl / AppStat@LAL Learning to discover
WHAT IS THIS?
61
62. B. Kégl / AppStat@LAL Learning to discover
0 100 200 300 400 500 600 700 800
0
5
10
15
20
25
30
35
40
t @nsD
PE
WHAT IS THIS?
62
63. B. Kégl / AppStat@LAL Learning to discover
WHAT IS THIS?
!"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/
12(#"(*-34*5#+"-$'.*/,-60"/*'$*(,0*2'7*8.#&
9-:;&0<*#$+*':;"0//'=0
>$0&#/('.*"0#.('-$*'$*2'7*8.#&
>$(0"#.('-$
>$'('#&*?'-$
@%(A-'$A*
B"#A:0$(/
2.#((0"0+*?'-$
8C0.(0+*D%.&0-$
2':;&0*E%(*$'.0
2,-"(*("%$.#(0+*/,-60"/63
64. B. Kégl / AppStat@LAL Learning to discover
MODELS
• Inference
• if you want to be able to answer questions about observed
phenomena, you need a model
• if you want quantitative answers, you need a formal model
• Formal setup
• x: observation vector, Θ: parameter vector to infer
• likelihood: p(x | Θ)
• simulator: given Θ, generate a random x
64
65. B. Kégl / AppStat@LAL Learning to discover
A FORMAL MODEL
Bal´azs K´egl/LAL 3
Bal´azs K´egl/LAL 6
The observatory
Bal´azs K´egl / LAL 4
energy, direction, mass
X0, HEP
XMax, NMuMax, NMuTotal, LEP
LDF, asymmetry, S1000, S38, NMu1000
S, t0, risetime, jumps
Bal´azs K´egl/LAL
Cherenkov light
Bal´azs K´egl/LAL 12
PEs given ideal response
name notation unit
expected number of PEs in bin i ¯ni unitless
number of PEs in bin i ni unitless
0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475
0
2
4
6
8
10
12
14
16
18
20
22
24
26
t ns⇥
PE
AL 11
Ideal muon response
name notation unit
signal decay time ⇤ ns
signal risetime td ns
muon arrival time tµ ns
muon tracklength Lµ m
muon energy factor µ unitless
avg number of PEs per 1 m tracklength ⇥ m 1
L⌅⇤⌅⇧t⇥
1⌥
td
⌃
td
L⌅ ⇤⌅ ⇧
⌃td
t⌅
0 25 50 75 100 125 150 175 200
0
2
4
6
8
10
12
14
16
18
20
22
24
26
t ns⇥
PE
p(x | t)
0 100 200 300 400 500 600 700 800
0
5
10
15
20
25
30
35
40
t @nsD
PE
p(x | t1,...t4) p(t | Θ)
65
66. B. Kégl / AppStat@LAL Learning to discover
INFERENCE BY SAMPLING
66
67. B. Kégl / AppStat@LAL Learning to discover
HOW TO BUILD MODELS FOR THESE?
0 100 200 300 400 500 600 700 800
0
5
10
15
20
25
30
35
40
t @nsD
PE
!"#"$%&'()*+,+)-,)
,
!"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/
12(#"(*-34*5#+"-$'.*/,-60"/*'$*(,0*2'7*8.#&
9-:;&0<*#$+*':;"0//'=0
>$0&#/('.*"0#.('-$*'$*2'7*8.#&
>$(0"#.('-$
>$'('#&*?'-$
@%(A-'$A*
B"#A:0$(/
2.#((0"0+*?'-$
8C0.(0+*D%.&0-$
2':;&0*E%(*$'.0
2,-"(*("%$.#(0+*/,-60"/
5'A,*A"#$%&#"'()*;0":'(/*+0(#'&0+*='06*'$(-*,#+"-$'.*/,-60"
π
67
68. B. Kégl / AppStat@LAL Learning to discover
LEARNING FROM SCRATCH:
THE DEEP LEARNING REVOLUTION
• Training multi-layer neural networks
• biological inspiration: we know the brain is multi-layer
• appealing from a modeling point of view: abstraction increases with
depth
• notoriously difficult to train until Hinton et al. (stacked RBMs) and
Bengio et al. (stacked autoencoders), around 2006
• the key principle is (was?) unsupervised pre-training
• they remain computationally very expensive, but they learn high-
level (abstract) features and they scale: with more data they learn
more
68
69. B. Kégl / AppStat@LAL Learning to discover
• Google passes the “purring
test” (ICML’12)
• 16K cores watching 10M
youtube stills for 3 days
• completely unsupervised: the
cat has just appeared as a
useful concept to represent
LEARNING FROM SCRATCH:
THE DEEP LEARNING REVOLUTION
69
70. B. Kégl / AppStat@LAL Learning to discover
• Can we also learn physics by observing natural
phenomena?
LEARNING FROM SCRATCH:
THE DEEP LEARNING REVOLUTION
!"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/
12(#"(*-34*5#+"-$'.*/,-60"/*'$*(,0*2'7*8.#&
9-:;&0<*#$+*':;"0//'=0
>$0&#/('.*"0#.('-$*'$*2'7*8.#&
>$(0"#.('-$
>$'('#&*?'-$
@%(A-'$A*
B"#A:0$(/
2.#((0"0+*?'-$
8C0.(0+*D%.&0-$
2':;&0*E%(*$'.0
2,-"(*("%$.#(0+*/,-60"/70
71. B. Kégl / AppStat@LAL Learning to discover
DEEP LEARNING FOR IMAGING
CALORIMETERS
• Collaboration with
• Roman Poeschl (ILC / LAL)
• Naomi van der Kolk (ILC / LAL)
• Sviatoslav Bilokin (ILC / LAL)
• Mehdi Cherti (AppStat / LAL)
• Trong Hieu Tran (ILC / LLR)
• Vincent Boudry (ILC / LLR)
71
72. B. Kégl / AppStat@LAL Learning to discover
FOOD FOR THOUGHT 1:
CERN ACCELERATING SCIENCE
Data Science
!
Design of automated methods
to analyze massive and complex data
in order to extract useful information from them
72
73. B. Kégl / AppStat@LAL Learning to discover
Data science
statistics
machine learning
signal processing
data visualization
databases
Tool building
software engineering
clouds/grids
high-performance computing
optimization
Domain science
life brain earth
universe human society
FOOD FOR THOUGHT 1:
CERN ACCELERATING SCIENCE
73
74. B. Kégl / AppStat@LAL Learning to discover
You have been doing data science for a long time
Consider transferring knowledge
to other sciences on
how to organize large scientific projects around data
FOOD FOR THOUGHT 1:
CERN ACCELERATING SCIENCE
74
75. B. Kégl / AppStat@LAL Learning to discover
• What is a data challenge:
• outreach to public?
• peer-to-peer communication?
FOOD FOR THOUGHT II:
CERN ACCELERATING SCIENCE
75
76. B. Kégl / AppStat@LAL Learning to discover
• What is a data challenge:
• between the two: communicating (translating!) your technical
problems to another scientific community which might have solutions
(and know-how to design solutions) for you
• think about building formal channels (as you did for outreach and
“classical” publications)
FOOD FOR THOUGHT II:
CERN ACCELERATING SCIENCE
76
77. B. Kégl / AppStat@LAL Learning to discover
THANK YOU!
77
78. B. Kégl / AppStat@LAL Learning to discover
BACKUP
78
79. B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
How to design g to maximize the sensitivity?
• A two-stage approach
1. optimize a discriminant function f : Rd
! R for balanced AUC or
balanced classification error: N0
s = N0
b = 0.5
Tree, N = 2., T = 100000
1 10 100 1000 104
0.00
0.05
0.10
0.15
0.20
0.25
T
balancederror
AdaBoost learning curves
79
80. B. Kégl / AppStat@LAL Learning to discover
Comparing with the official Atlas analysis
• Atlas does a manual pre-selection, the first maximum of the AMS is
completely eliminated. Why? Have we found something, or they have
an implicit reason?
• No we haven’t, yes they have
• µb has a ⇠ 10% relative systematic uncertainty: the 300 signals are
completely submerged by the = 600 background systematics
AMS = 1s
AMS = 2s
AMS = 3s
AMS = 4s
AMS = 5s
0 2000 4000 6000 8000 10000 12000
0
100
200
300
400
500
b HFPL
sHTPL
Unnormalized ROC
1 / 1
80
81. B. Kégl / AppStat@LAL Learning to discover
The Higgs boson ML challenge
• Dilemma: the physically relevant AMS is optimized in a tiny region
• the AMS has a high variance: a bad measure to compare the
participants
• in the Atlas analysis, there was 1 between the expected and
measured significances
1 / 1
81
82. B. Kégl / AppStat@LAL Learning to discover
The Higgs boson ML challenge
• We are even nervous about the original AMS (red), so we regularize it
AMS =
s
2
✓
(s + b + breg) ln
✓
1 +
s
b + breg
◆
s
◆
with breg = 10
1 / 1
82