LHC實驗是現今粒子物理實驗的最先端,2012年所發現的希格斯粒子更是物理界的一大盛事。繼Atlas實驗在Kaggle公開Higgs挑戰之後,另一個LHC的LHCb實驗也將實驗數據搬上了Kaggle平台。本講題將簡介背後的實驗,並使用LHCb的數據以SciKit-Learn進行多維度數據分析與使用MatPlotLib視覺化。
Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib
淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib
1. 淺嚐淺嚐 LHCbLHCb 數據分析的滋味數據分析的滋味
Play around the LHCb Data on Kaggle withPlay around the LHCb Data on Kaggle with
SciKit-Learn and MatPlotLibSciKit-Learn and MatPlotLib
Yuan CHAO ( 趙元 )
(National Taiwan University, Taipei, Taiwan)
PyCon2017
2017/06/09-11
22. 淺嚐味物理
Search for
charged lepton
flavour violation
https://www.kaggle.com/c/flavours-of-physics
Search for new physics on lepton-flavour violation
$15,000 & 673 teams
28. 28
四大問題四大問題 The QuestionsThe Questions
LHC was built for the following
purposes:
質量的來源
To find the origin of mass...
the Higgs boson.
暗物質與暗能量
Looking for the unification..
Super-symmetry as well as
other candidates of Dark Mater
& Dark energy
反物質的消失
Investigate the mystery of
anti-matter disappearance
宇宙初期狀態
Physics at the early stage of the
universe: Heavy Ion Collisions
and Quark-Gluon Plasma
Courtesy of Center for European Nuclear Research (CERN), Geneva,
Switzerland.
29. 29
Symmetry & Flavor PhysicsSymmetry & Flavor Physics
People think the universe is symmetric? E = mc2
Parity violation introduced by T.D. Lee ( 李政道 ) and C.N.
Yang ( 楊振寧 ) in 1956.
–- 宇稱不守恆
Parity violation seen in a β decay by C.S. Wu ( 吳健雄 ) in
1957. Nobel prize for Lee & Yang.
CP violation discovered in Kaon system in 1964.
M. Kobayashi and T. Maskawa introduced CP violation in
the Standard Model in 1973.
–- 電荷・宇稱不守恆
Sanda and Carter pointed out the possibility of CP
violation in the B meson system in 1980.
30. Prof. Wu's experiment in
1956. Prof. Li and Yang
got Nobel Prize in 1957.
http://de.wikipedia.org/wiki/Wu-Experiment
31. 31
Symmetry & Flavor PhysicsSymmetry & Flavor Physics
KTeV experiment at FNAL established the direct CP
violation in Kaon system and confirmed by NA48 at
CERN in 1999.
Belle and BaBar observed indirect CP violation B meson
system in 2002.
Belle observed the direct CP violation in B → ππ but not
confirmed by BaBar in 2004
Belle and BaBar present the evidence of direct CP
violation in B → Kππ in 2004.
M. Kobayashi ( 小林誠 ) and T. Maskawa ( 益川敏英 )
share the Nobel Prize in 2008
with Y. Nambu ( 南部陽一郎 ).
CP violation can't fully explain
the Baryon asymmetry problem.
→ People cont. searching for NP
32. Machine
Learning is
nothing new
in HEP
People in Tevatron, B-
factories, LEP and LHC
experiments more or less use
MVA in their studies!
(LL, LD → BDT, NN, .. → DL?)
35. 35
The Kaggle ChallengeThe Kaggle Challenge
τ→3μ breaks lepton flavour conservation
Basic Data operations
Input variables
Signal vs. Background
Correlations
K-S test, CvM test
ROC and AUC
Machine Learning Algor.
Event weight
Training and testing
AUC score calculation
Summary
https://www.kaggle.com/c/flavours-of-physics/data
Samle Events
training.csv mixed MC & data
τ→3μ
test.csv mixed MC & data
τ→3μ
check_agre
ement.csv
mixed MC & data
Ds→φ(μμ)π
check_corr
elation.csv
real background data
36. 36
The GoalThe Goal
Look for the rare events of τ→3μ
Classifier not too dependent on MC and data
Classifier not too dependent on the τ mass
The score is counted using the weighted area under the
ROC curve (AUC)
37. 37
The K-S TestThe K-S Test
The τ→3μ process is not yet observed
Signal is made with MC simulation
Background are from real data
The classifier should not pick up the difference
A control channel Ds→φ(μμ)π is used for the similarity
The Kolmogorov-Smirnov (KS) test used to evaluate
the difference; requiring KS < 0.09
F are the cumulative distribution functions for MC and real data
38. 38
The CvM TestThe CvM Test
The provided background events are not τ-free
Classifier should not too much depend on τ-mass
The distribution of τ-mass could be used to extract signal #
The Cramer-von Mises (CvM) test is used to test the
correlation; requiring CvM-value < 0.002
F are the predictions cumulative distribution functions for all
data and data in some mass interval corresponding.
39. “Rules… Some of them can be bent, others are to
be broken” – Morpheus
https://artistotleonline.wordpress.com/category/climax/
40. 40
LiveLive DEMO with Jupyter NBDEMO with Jupyter NB
Forked from Kaggle challenge package
https://github.com/yandexdataschool/flavours-of-physics-start
Now following my derived Jupyter notebook
https://github.com/yuanchao/flavours-of-physics-start/blob/master/my_baseline.ipynb
41. 41
Related URLsRelated URLs
LHC computing grid (LCG) and CERN overview video:
https://cds.cern.ch/record/2020780
"Higgs ML" Kaggle Challenge
https://www.kaggle.com/c/higgs-boson
“Flavour of physics” Kaggle Challenge
https://www.kaggle.com/c/flavours-of-physics
宇宙的尺度 http://htwins.net/scale2/
Heavy Flavour Data Mining workshop
https://indico.cern.ch/event/433556/
Official jupyter NB:
https://github.com/yandexdataschool/flavours-of-physics
-start
My derived jupter NB:
https://github.com/yuanchao/flavours-of-physics-start
43. 43
Installing Jupyter & SciPyInstalling Jupyter & SciPy
Setup a virtual environment
(you need python installed before hands)
Using pip:
$ pip3 install virtualenv
You can also use easy_install or apt-get instead
Open a terminal
Type in the following commands:
$ virtualenv -p python3 .scienv
$ source .scienv/bin/activate ← activate the environment!
$ pip3 install --upgrade pip
$ pip3 install jupyter
$ pip3 install scipy pandas sklearn ← you get all packages
Then start the jupyter notebook server:
$ jupyter notebook ← a web page will be loaded automatically
Here we go!