SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
Root analysis and implications
to analysis model in ATLAS
        Akira Shibata, New York University
                 @ ACAT 08 in Erice
                   Nov 05, 2008




                                             1
Are we ready to face data from LHC collisions?
Grid computing? Do we have enough CPU? Tape? Disks?
RAM? Do we need T1? T2? T3? AF? Do we need backdoor
access? Are the machines maintained? Is it scary? Are they
online? Do we have enough bandwidth? Can we copy data
across the world? Can we reach the data we need? Can we
reduce the data size? ESD? AOD? D1PD? D2PD? D3PD? Can
we download them? Do we need interactive access? How
do we write an analysis? How fast do they run? Do we need
to buy more disk? How big is my ntuple? Do we need to buy
more CPU? Disks? RAM? Are we up to date? Do I look cool
if I buy a mac? Is virtual machine useful? Why do we use
ROOT? What is PROOF? Is python fast enough? Is it easy
to code? How often will I need to process my data? How
fast will my analysis run? What can I do to get faster? What
are the options? What is the future technology?

ACAT - Novebmer 5, 2008                      akira.shibata@nyu.edu
                                                                     2
Analysis in the Era of Grid Computing
                      ESD
                       ESD
  T1                    ESD                                                                                          T1
                         ESD
                     ~500kB/evt   Central                                            Root Native
                                   AOD/
                                   DPD                                              Root + POOL
                                  making                                         Rough size estimate
                          D1PD                AOD
  T2                       D1PD                AOD                                                                   T2
                                                AOD                                                          cpu
                                            ~100kB/evt     Grid
                    Get                                   Analy /
                                                           DPD
                                                          making
  T3               D1PD                       D2PD                    D3PD                                           T3
                    D1PD                       D2PD                    D3PD
                 30-80kB/evt                10-50kB/evt             1-10kB/evt      request

                                     ROOT / ARA                                                    deliver
                                  Analysis at Institute

                                                                      User
                                                                        User
                                                                                  Local Root           Histo
Desktop                                                              Ntuple                             Histo      Desktop
                                                                       Ntuple      Analysis
                                                                     ~1kB/evt



Tiered model for computing model. Leveled approach needed to
 optimize the system. Above all, how well does it work from the
                   physicistsʼ point of view?
  ACAT - Novebmer 5, 2008                                                                     akira.shibata@nyu.edu
                                                                                                                             3
Derived Physics Data
 •    DPDs are created using the following operations:
     •   Skimming: selecting the events one needs
     •   Thinning: selecting the objects one needs
     •   Slimming: removing information from objects.
 •    ESDs hold full information from reconstruction. AOD,
      DnPDs are derived with increasing level of derivation.
 •    Primary purpose of D1PD is to have access to parts of
      the ESD information that are otherwise difficult to get to.
 •    D1/2PD are in POOL format. D3PD refers to any DPD that
      are in ntuple format.
 •    ESD, AOD, D1PD contents are defined by groups. Several
      types of D1PD are defined by performance groups. D2PD
      and D3PD are defined by users.
 •    First level analysis may be done (variable calculation,
      object reco etc) when D2/3PD are created.

ACAT - Novebmer 5, 2008                          akira.shibata@nyu.edu
                                                                         4
Motivation for Profiling ROOT Analysis
•       The primary use of the Grid is event reconstruction,
        storage and production of reduced data. This is done using
        ATLAS software, Athena. Some analysis happens here too.
    •     However, post-Grid (non-Athena) ROOT analysis is the
          main stage for physics analysis.
•       Mostly a user-level decision due to the private nature of
        physics analysis but:
    •     the situation is becoming more complex due to
          availability of new technology;
    •     no good summary exists comparing the available
          options;
    •     it is an important ingredient for an efficient analysis
          model;
    •     it is needed for estimating resource requirements.
•       Technical discussions does not always answer practical
        questions. This study will benchmark analysis “modes” in
        realistic settings based on wall-time measurements.
ACAT - Novebmer 5, 2008                            akira.shibata@nyu.edu
                                                                           5
“Flat” vs POOL Persistency
•   Many of the complexity in the current situation is due to
    the POOL technology (additional layer to the ROOT
    persistency technology) used in ATLAS. POOL supports:
    •  Metadata lookup - used by TAG to access events in
       large file without having to read the full contents.
    •  More flexibility in writing out complex objects. Has its
       own way of T/P separation and schema evolution.
•   When the decision was made ROOT persistency was not
    so great as it is now.
    •  Problems writing out STL objects.
    •  Problems referring to objects in different trees/files.
•   ROOT persistency has improved and now has less
    issues.
•   ARA - enabling reading POOL objects from ROOT by
    calling POOL converters on demand. P->T conversion.
    Takes extra read time.
ACAT - Novebmer 5, 2008                         akira.shibata@nyu.edu
                                                                        6
Summary of Existing Analysis Modes
  Mode          Draw           CINT         ACLiC      PyRoot         g++         Athena

  Ntuple

  POOL

 Compiled/
              Interpreted   Interpreted    Compiled   Interpreted   Compiled       Both
Interpreted
                 C++                                                               C++
 Language                     (C++)--        C++       Python         C++
               Python                                                             Python

Interactive

Additional                           MakeClass                      SFrame
                   -                                   SPyroot                       -
packages                         MakeSelector                         AMA
 Standard
                   -             -                         -
 dev env
  Athena
components

  Implemented most common options. All codes available in
        ATLAS CVS: users/ashibata/RootBenchmark
ACAT - Novebmer 5, 2008                                              akira.shibata@nyu.edu
                                                                                             7
Benchmark Analysis Contents
  •    A simple Zee reconstruction analysis implemented for
       every mode:
      1. Access electron container (POOL) / electron
          kinematics branches (Ntuple)
      2. Select electrons using isEM and pt and charge
      3. Fill histograms with electron kinematics (pT and
          multiplicity)
      4. Combine electrons to reconstruct Z
      5. Fill histogram with Z mass
      6. Write histograms out in finalize
      •   Repeated the above 10 times
  •    Not complex enough for a real analysis but not entirely
       trivial.
  •    For Draw, plot electron after isEM/pt/charge selection.
       No four vector arithmetics.



ACAT - Novebmer 5, 2008                         akira.shibata@nyu.edu
                                                                        8
Obtaining Reliable Results
  •   Using POSIX measurement as much as
      possible. Use wall time from time module.
      • Avoiding somewhat unstable measurement
        with TStopwatch.
  •   Measurements affected by other activities on
      the machine. Overcome by multiple
      measurements.
      • Machine: Acas (BNL) node with normal load
        3.34GB mem, 2 cores Xeon@ 2.00 GHz, data
        on NFS.
  •   Disk cache leads to misleading results. CPU
      time = Wall time once the data is in memory.
      • Force disc read by flushing RAM. Do not re-
        read until all other files have been read.
        Alternate between AOD and ntuple analyses.

ACAT - Novebmer 5, 2008                akira.shibata@nyu.edu
                                                               9
Methodology
                                                       AOD
                                                      AOD                                     1. Measured time taken to
Wall time (s)




                                                                                                 process with increasing
                Wall time (s)




                                                 gpp (init:6.64e+01s, rate:5.35e+02Hz)
                                                gpp (init:6.64e+01s, rate:5.35e+02Hz)

            1600
              1600                              SFrame (init:3.62e+01s, rate:3.15e+02Hz)
                                                 SFrame (init:3.62e+01s, rate:3.15e+02Hz)        number of events.
                                                Draw (init:4.62e+01s, rate:1.25e+02Hz)
                                                 Draw (init:4.62e+01s, rate:1.25e+02Hz)       2. Repeat measurements and
                            1400
            1400
                                                PyAthena (init:2.74e+01s, rate:9.65e+01Hz)
                                                 PyAthena (init:2.74e+01s, rate:9.65e+01Hz)
                                                                                                 take average for each point.
                                                Athena (init:3.08e+01s, rate:6.86e+01Hz)


                            1200                 Athena (init:3.08e+01s, rate:6.86e+01Hz)
                                                CINT (init:5.25e+01s, rate:1.85e+01Hz)        3. Fit a straight line to obtain
                                                                                                 overhead (offset) and rate
            1200                                PyRoot (init:2.50e+00s, rate:1.24e+01Hz)
                                                 CINT (init:5.25e+01s, rate:1.85e+01Hz)
                                                                                                 (evt/sec).
                            1000                 PyRoot (init:2.50e+00s, rate:1.24e+01Hz)
                                                                                              4. Calculate errors from
            1000
               800                                                                               standard deviation.
                                                                                              Only use rate in comparing
                     800
                       600                                                                    the modes. Overhead varies
                                                                                              between a fraction of seconds
                                400                                                           to tens of seconds.
                     600
                                200
                     400
                                    0
                                     0   1000020000300004000050000
                     200                                                  Number of events

                                ACAT - Novebmer 5, 2008                                                       akira.shibata@nyu.edu
                                0                                                                                                     10
Data and Format
                                 POOL            Ntuple

                                 AOD             CBNT?
    Full contents
                                 144.22 kB/evt   not tried

    DPD contents                 TopD1PD         TopD3PD
    Trigger/Jets/Leptons etc     31.42 kB/evt    4.87 kB/evt

    Small DPD contents           SmallD2PD       SmallD3PD
    Tracks + Electrons           18.74 kB/evt    0.71 kB/evt

    Very small DPD               VerySmallD2PD   VerySmallD3PD
    Electrons                    1.06 kB/evt     0.37 kB/evt

    All derived from FDR2 AODs. All produced on PANDA
    (except AOD and D1PD). Around 10,000 events per file.
    Total sample size for one data type ranges between 1 GB -
    100 GB. A use-case driven comparison. Input file sizes are
    different.

ACAT - Novebmer 5, 2008                          akira.shibata@nyu.edu
                                                                         11
AOD Analysis Results
 AOD Input      AOD (rate, error)
                  mode
                                                  Compiled non-
                               gpp (535Hz, 3%)    framework analysis is
                                                  the fastest.
                          SFrame (321Hz, 13%)


                            Draw (138Hz, 35%)
                                                  Only small difference
                             Athana (98Hz, 8%)
                                                  between C++/Python in
                                                  Athena.
                          PyAthena (95Hz, 11%)


                             CINT (21Hz, 15%)
                                                  CINT by far the slowest.
                           TSelector (19Hz, 2%)


                           PyRoot (17Hz, 18%)
                                                  Seems to be reading all
                                                  containers in the files
0            200             400           Hz
ACAT - Novebmer 5, 2008                                        akira.shibata@nyu.edu
                                            Hz
                                                                                       12
D1PD Level Comparison mode (rate, error)
                              Top_D3PD
Top
         Top_D1PD
      D1PDInput                                   Top D3PD Input
                                                                     ACLiC_Opt (58719Hz, 16%)
                             gpp (1130Hz, 15%)

                                                                         ACLiC (48494Hz, 20%)
                          SFrame (721Hz, 17%)
                                                  Ntuple/POOL=40.6         gpp (45869Hz, 21%)

                            Athana (313Hz, 6%)                TSelector_ACLiC (18551Hz, 18%)

                                                  Ntuple/POOL=13.1       SFrame (9453Hz, 19%)
                            Draw (298Hz, 55%)
                                                  Ntuple/POOL=7.9          Draw (2343Hz, 15%)
                          PyAthena (204Hz, 4%)
                                                  Ntuple/POOL=2.7          Athana (838Hz, 1%)

                             PyRoot (43Hz, 9%)    Ntuple/POOL=7.1         PyRoot (300Hz, 21%)

                                                  Ntuple/POOL=1.2       PyAthena (242Hz, 30%)
                               CINT (26Hz, 6%)
                                                  Ntuple/POOL=1.8         TSelector (39Hz, 3%)

                           TSelector (22Hz, 2%)   Ntuple/POOL=1.2             CINT (32Hz, 2%)

0             500             1000
                               Hz     0      20000 40000           60000 Hz
An order of magnitude advantage for using ntuple for g++ analysis. Much less
                                Hz
difference with non-compiled modes.                                      Hz
ACAT - Novebmer 5, 2008                                                akira.shibata@nyu.edu
                                                                                                 13
D2PD Level Comparison mode (rate, error)
        Small_D2PD       Small_D3PD
 Small D2PD Input                                 Small D3PD Input
                              gpp (2132Hz, 6%)    Ntuple/POOL=33.3                gpp (71003Hz, 7%)

                                                                          ACLiC_Opt (58223Hz, 18%)
                          SFrame (1679Hz, 29%)
                                                                   TSelector_ACLiC (33579Hz, 23%)
                            Athana (596Hz, 5%)
                                                      Ntuple/POOL=8.7       SFrame (14597Hz, 26%)

                          PyAthena (326Hz, 4%)        Ntuple/POOL=21.2           Draw (6358Hz, 17%)


                             Draw (300Hz, 29%)
                                                  Ntuple/POOL=1.4                Athana (855Hz, 3%)

                                                      Ntuple/POOL=3.8          PyRoot (382Hz, 22%)
                           PyRoot (100Hz, 10%)
                                                      Ntuple/POOL=1.1        PyAthena (367Hz, 28%)
                               CINT (29Hz, 4%)
                                                      Ntuple/POOL=1.7          TSelector (40Hz, 2%)

                           TSelector (23Hz, 1%)       Ntuple/POOL=1.1               CINT (32Hz, 1%)

0              1000                2000 Hz        0        20000         40000
                                                                        Hz         60000
POOL analysis faster than AOD input by x4. Larger difference between Athena
                                 Hz
and PyAthena with smaller input files. Why?                               Hz
ACAT - Novebmer 5, 2008                                                   akira.shibata@nyu.edu
                                                                                                      14
Very Small Input Comparison
    Very_Small_D2PD      Very_Small_D3PD
Very Small D2PD Input                             Very Small D3PD Input
                                                                         ACLiC_Opt (63555Hz, 9%)
                               gpp (2798Hz, 5%)

                                                  Ntuple/POOL=17.3            gpp (48516Hz, 17%)
                          SFrame (2519Hz, 12%)
                                                                   TSelector_ACLiC (34201Hz, 22%)


                             Athana (667Hz, 8%)       Ntuple/POOL=5.5      SFrame (13751Hz, 28%)

                                                  Ntuple/POOL=23.0            Draw (6777Hz, 16%)
                            PyRoot (416Hz, 19%)
                                                  Ntuple/POOL=1.3              Athana (854Hz, 5%)

                          PyAthena (307Hz, 14%)       Ntuple/POOL=1.1      PyAthena (343Hz, 28%)

                                                      Ntuple/POOL=0.8        PyRoot (331Hz, 25%)
                             Draw (294Hz, 47%)
                                                                              TSelector (40Hz, 1%)

                                CINT (31Hz, 0%)       Ntuple/POOL=1.0             CINT (32Hz, 1%)

0          1000            2000           3000    0         20000       40000        60000 Hz
D2PD nearing D3PD even more. A few thousand Hz possible with ARA. Ntuple
                                     Hz
mode still factor of 5-10 faster in C++ modes.                        Hz
ACAT - Novebmer 5, 2008                                                  akira.shibata@nyu.edu
                                                                                                     15
Event Size * E
                                                 I/O104
                                                     Dependency Comparison
                                             POOL Analysis                                                                 Ntuple Analysis
    Event Size * Exec Rate (kB/s)




                                                                                  Event Size * Exec Rate (kB/s)
                                                                                                                  105
                                    104                         3
                                                               10
                                                                                                                                            ACLiC
                                                                                                                    4
                                                                                                                  10                        gpp

                                                                                                                                            PyAthena
                                       3
                                    10                              AthAthena
                                                                     Draw                                                                   TSelector
                                                                    PyRoot
                                                                    Athena
                                                                                                                  103                       ACLiC_Opt
                                                                2   PyAthena
                                                               10    AthAthena
                                                                    PyRoot
                                                                    Draw
                                                                      PyRoot
                                                                    PyAthena
                                                                     PyAthena
                                                                                                                                            CINT
                                                                                                                                                 ACLiC

                                                                                                                                                 gpp


                                                                                                                                            TSelector_ACLiC
                                                                                                                                               PyAthena


                                       2                            gpp
                                                                                                                                                 TSelector

                                    10                                Draw
                                                                    gpp                                             2
                                                                                                                                               ACLiC_Opt
                                                                                                                                            AthAthena

                                                                    CINT
                                                                      gpp                                         10                             CINT

                                                                                                                                            PyRoot
                                                                                                                                               TSelector_ACLiC


                                                                     CINT
                                                                      CINT
                                                                    SFrame
                                                                                                                                                 AthAthena

                                                                                                                                            SFrame
                                                                                                                                               PyRoot
                                                                      SFrame
                                                                                                                                                 SFrame
                                                                     SFrame
                                                                    TSelector
                                                                      TSelector                                                             Draw
                                                                                                                                               Draw




                                         0 20 40 60 80 100120 140160                                                   0    1   2   3     4                  5
 0 20 40 60 80 100120 140160
0 20 40 60 80 100120 140160
                   Event Size (kB)
                                        0 Event Size (kb) 2 analysis coming from file
                                          Event 1 (kB)
                       Clear I/O constraint > 20 kB in POOL
                                                                3      4       5                                                          Event Size (kB)

                                                                                                                                    Event Size (kb)
                       size, NOT read-out size. Ntuples are usually smaller than 20kB.
                                    ACAT - Novebmer 5, 2008                                                                         akira.shibata@nyu.edu
                                                                                                                                                                 16
Summary
•   Very clear performance advantage for ROOT native ntuple
    format. An order of magnitude difference. Ball park figure:
    Thousands evts/sec vs hundreds of Hz. Those numbers
    should be taken as upper limit, real analyses would be
    more complex.
•   Compiled mode is ~two orders of magnitude faster than
    non-compiled options.
•   Use of frameworks, even quite a simple one, can slow
    things down, though, any realistic analysis would require
    some infrastructure. Choose/write frameworks wisely!
•   With Athena, the overhead of framework seems large,
    though typical DPD jobs can be highly CPU intensive.
•   Effect of file caching by system ties input file size and the
    execution rate (regardless of the actual read-out). Above
    20 kb/evt, the analysis is bound by this effect. This is a
    very tight slimming/thinning requirement for D12PD. May
    be able to improve this with high performance disk.
ACAT - Novebmer 5, 2008                         akira.shibata@nyu.edu
                                                                        17
Acknowlegement


    I have bothered a lot of people with this project
    including (random order):
    Scott Snyder, Wim Lavrijsen, Sebastien Binet,
    Emil Obrekov, David Quarrie, Kyle Cranmer,
    David Adams, Sven Menke, Shuwei Ye, Sergey
    Panitkin, Stephanie Majeski, Hong Ma, Tadashi
    Maeno, Attila Krasznahorkay, Jim Cochran,
    roottalk, Paolo Califiura
                                        Many thanks.


ACAT - Novebmer 5, 2008                    akira.shibata@nyu.edu
                                                                   18

Más contenido relacionado

Destacado

20141127 py datatokyomeetup2
20141127 py datatokyomeetup220141127 py datatokyomeetup2
20141127 py datatokyomeetup2Akira Shibata
 
The LHC Explained by CNN
The LHC Explained by CNNThe LHC Explained by CNN
The LHC Explained by CNNAkira Shibata
 
Top Cross Section Measurement
Top Cross Section MeasurementTop Cross Section Measurement
Top Cross Section MeasurementAkira Shibata
 
Analysis Software Development
Analysis Software DevelopmentAnalysis Software Development
Analysis Software DevelopmentAkira Shibata
 
Top quark physics at the LHC
Top quark physics at the LHCTop quark physics at the LHC
Top quark physics at the LHCAkira Shibata
 
LHCにおける素粒子ビッグデータの解析とROOTライブラリ(Big Data Analysis at LHC and ROOT)
LHCにおける素粒子ビッグデータの解析とROOTライブラリ(Big Data Analysis at LHC and ROOT)LHCにおける素粒子ビッグデータの解析とROOTライブラリ(Big Data Analysis at LHC and ROOT)
LHCにおける素粒子ビッグデータの解析とROOTライブラリ(Big Data Analysis at LHC and ROOT)Akira Shibata
 
PyData.Tokyo Hackathon#2 TensorFlow
PyData.Tokyo Hackathon#2 TensorFlowPyData.Tokyo Hackathon#2 TensorFlow
PyData.Tokyo Hackathon#2 TensorFlowAkira Shibata
 
20150421 日経ビッグデータカンファレンス
20150421 日経ビッグデータカンファレンス20150421 日経ビッグデータカンファレンス
20150421 日経ビッグデータカンファレンスAkira Shibata
 
人工知能をビジネスに活かす
人工知能をビジネスに活かす人工知能をビジネスに活かす
人工知能をビジネスに活かすAkira Shibata
 
PyData Tokyo Tutorial & Hackathon #1
PyData Tokyo Tutorial & Hackathon #1PyData Tokyo Tutorial & Hackathon #1
PyData Tokyo Tutorial & Hackathon #1Akira Shibata
 
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Recruit Technologies
 
リクルートはいかにして、ディープラーニング(深層学習)の導入を成功させたか
リクルートはいかにして、ディープラーニング(深層学習)の導入を成功させたかリクルートはいかにして、ディープラーニング(深層学習)の導入を成功させたか
リクルートはいかにして、ディープラーニング(深層学習)の導入を成功させたかRecruit Technologies
 
DataRobot活用状況@リクルートテクノロジーズ
DataRobot活用状況@リクルートテクノロジーズDataRobot活用状況@リクルートテクノロジーズ
DataRobot活用状況@リクルートテクノロジーズRecruit Technologies
 
PyData NYC by Akira Shibata
PyData NYC by Akira ShibataPyData NYC by Akira Shibata
PyData NYC by Akira ShibataAkira Shibata
 
Akira shibata at developer summit 2016
Akira shibata at developer summit 2016Akira shibata at developer summit 2016
Akira shibata at developer summit 2016Akira Shibata
 

Destacado (17)

20141127 py datatokyomeetup2
20141127 py datatokyomeetup220141127 py datatokyomeetup2
20141127 py datatokyomeetup2
 
The LHC Explained by CNN
The LHC Explained by CNNThe LHC Explained by CNN
The LHC Explained by CNN
 
Top Cross Section Measurement
Top Cross Section MeasurementTop Cross Section Measurement
Top Cross Section Measurement
 
Analysis Software Development
Analysis Software DevelopmentAnalysis Software Development
Analysis Software Development
 
Top quark physics at the LHC
Top quark physics at the LHCTop quark physics at the LHC
Top quark physics at the LHC
 
LHCにおける素粒子ビッグデータの解析とROOTライブラリ(Big Data Analysis at LHC and ROOT)
LHCにおける素粒子ビッグデータの解析とROOTライブラリ(Big Data Analysis at LHC and ROOT)LHCにおける素粒子ビッグデータの解析とROOTライブラリ(Big Data Analysis at LHC and ROOT)
LHCにおける素粒子ビッグデータの解析とROOTライブラリ(Big Data Analysis at LHC and ROOT)
 
PyData.Tokyo Hackathon#2 TensorFlow
PyData.Tokyo Hackathon#2 TensorFlowPyData.Tokyo Hackathon#2 TensorFlow
PyData.Tokyo Hackathon#2 TensorFlow
 
20150421 日経ビッグデータカンファレンス
20150421 日経ビッグデータカンファレンス20150421 日経ビッグデータカンファレンス
20150421 日経ビッグデータカンファレンス
 
20150128 cross2015
20150128 cross201520150128 cross2015
20150128 cross2015
 
LHC for Students
LHC for StudentsLHC for Students
LHC for Students
 
人工知能をビジネスに活かす
人工知能をビジネスに活かす人工知能をビジネスに活かす
人工知能をビジネスに活かす
 
PyData Tokyo Tutorial & Hackathon #1
PyData Tokyo Tutorial & Hackathon #1PyData Tokyo Tutorial & Hackathon #1
PyData Tokyo Tutorial & Hackathon #1
 
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
 
リクルートはいかにして、ディープラーニング(深層学習)の導入を成功させたか
リクルートはいかにして、ディープラーニング(深層学習)の導入を成功させたかリクルートはいかにして、ディープラーニング(深層学習)の導入を成功させたか
リクルートはいかにして、ディープラーニング(深層学習)の導入を成功させたか
 
DataRobot活用状況@リクルートテクノロジーズ
DataRobot活用状況@リクルートテクノロジーズDataRobot活用状況@リクルートテクノロジーズ
DataRobot活用状況@リクルートテクノロジーズ
 
PyData NYC by Akira Shibata
PyData NYC by Akira ShibataPyData NYC by Akira Shibata
PyData NYC by Akira Shibata
 
Akira shibata at developer summit 2016
Akira shibata at developer summit 2016Akira shibata at developer summit 2016
Akira shibata at developer summit 2016
 

Similar a Analysis Software Benchmark

Report to the NAC
Report to the NACReport to the NAC
Report to the NACLarry Smarr
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccsrisatish ambati
 
Presentation Ispass 2012 Session6 Presentation1
Presentation Ispass 2012 Session6 Presentation1Presentation Ispass 2012 Session6 Presentation1
Presentation Ispass 2012 Session6 Presentation1sairahul321
 
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...Larry Smarr
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Fisnik Kraja
 
Big Bad PostgreSQL: BI on a Budget
Big Bad PostgreSQL: BI on a BudgetBig Bad PostgreSQL: BI on a Budget
Big Bad PostgreSQL: BI on a BudgetJoshua L. Davis
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
CICC 2001 - Reducing Multiple Design Flow Support Requirements with OLA
CICC 2001 - Reducing Multiple Design Flow Support Requirements with OLACICC 2001 - Reducing Multiple Design Flow Support Requirements with OLA
CICC 2001 - Reducing Multiple Design Flow Support Requirements with OLATim55Ehrler
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersDatabricks
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004xlight
 
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM Research
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyDaniel Bimschas
 
How to Terminate the GLIF by Building a Campus Big Data Freeway System
How to Terminate the GLIF by Building a Campus Big Data Freeway SystemHow to Terminate the GLIF by Building a Campus Big Data Freeway System
How to Terminate the GLIF by Building a Campus Big Data Freeway SystemLarry Smarr
 
Notes on a High-Performance JSON Protocol
Notes on a High-Performance JSON ProtocolNotes on a High-Performance JSON Protocol
Notes on a High-Performance JSON ProtocolDaniel Austin
 
Rama krishna ppts for blue gene/L
Rama krishna ppts for blue gene/LRama krishna ppts for blue gene/L
Rama krishna ppts for blue gene/Lmsramakrishna
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.Kyong-Ha Lee
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Josef A. Habdank
 
20121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v320121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v3Tim Bell
 

Similar a Analysis Software Benchmark (20)

Report to the NAC
Report to the NACReport to the NAC
Report to the NAC
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
Presentation Ispass 2012 Session6 Presentation1
Presentation Ispass 2012 Session6 Presentation1Presentation Ispass 2012 Session6 Presentation1
Presentation Ispass 2012 Session6 Presentation1
 
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
 
Big Bad PostgreSQL: BI on a Budget
Big Bad PostgreSQL: BI on a BudgetBig Bad PostgreSQL: BI on a Budget
Big Bad PostgreSQL: BI on a Budget
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
CICC 2001 - Reducing Multiple Design Flow Support Requirements with OLA
CICC 2001 - Reducing Multiple Design Flow Support Requirements with OLACICC 2001 - Reducing Multiple Design Flow Support Requirements with OLA
CICC 2001 - Reducing Multiple Design Flow Support Requirements with OLA
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004
 
CLFS 2010
CLFS 2010CLFS 2010
CLFS 2010
 
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
 
How to Terminate the GLIF by Building a Campus Big Data Freeway System
How to Terminate the GLIF by Building a Campus Big Data Freeway SystemHow to Terminate the GLIF by Building a Campus Big Data Freeway System
How to Terminate the GLIF by Building a Campus Big Data Freeway System
 
Notes on a High-Performance JSON Protocol
Notes on a High-Performance JSON ProtocolNotes on a High-Performance JSON Protocol
Notes on a High-Performance JSON Protocol
 
Rama krishna ppts for blue gene/L
Rama krishna ppts for blue gene/LRama krishna ppts for blue gene/L
Rama krishna ppts for blue gene/L
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
 
20121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v320121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v3
 
Blue Gene Active Storage
Blue Gene Active StorageBlue Gene Active Storage
Blue Gene Active Storage
 

Más de Akira Shibata

大規模言語モデル開発を支える分散学習技術 - 東京工業大学横田理央研究室の藤井一喜さん
大規模言語モデル開発を支える分散学習技術 - 東京工業大学横田理央研究室の藤井一喜さん大規模言語モデル開発を支える分散学習技術 - 東京工業大学横田理央研究室の藤井一喜さん
大規模言語モデル開発を支える分散学習技術 - 東京工業大学横田理央研究室の藤井一喜さんAkira Shibata
 
W&B monthly meetup#7 Intro.pdf
W&B monthly meetup#7 Intro.pdfW&B monthly meetup#7 Intro.pdf
W&B monthly meetup#7 Intro.pdfAkira Shibata
 
20230705 - Optuna Integration (to share).pdf
20230705 - Optuna Integration (to share).pdf20230705 - Optuna Integration (to share).pdf
20230705 - Optuna Integration (to share).pdfAkira Shibata
 
W&B Seminar #5(to share).pdf
W&B Seminar #5(to share).pdfW&B Seminar #5(to share).pdf
W&B Seminar #5(to share).pdfAkira Shibata
 
makoto shing (stability ai) - image model fine-tuning - wandb_event_230525.pdf
makoto shing (stability ai) - image model fine-tuning - wandb_event_230525.pdfmakoto shing (stability ai) - image model fine-tuning - wandb_event_230525.pdf
makoto shing (stability ai) - image model fine-tuning - wandb_event_230525.pdfAkira Shibata
 
LLM Webinar - シバタアキラ to share.pdf
LLM Webinar - シバタアキラ to share.pdfLLM Webinar - シバタアキラ to share.pdf
LLM Webinar - シバタアキラ to share.pdfAkira Shibata
 
Kaggle and data science
Kaggle and data scienceKaggle and data science
Kaggle and data scienceAkira Shibata
 

Más de Akira Shibata (9)

大規模言語モデル開発を支える分散学習技術 - 東京工業大学横田理央研究室の藤井一喜さん
大規模言語モデル開発を支える分散学習技術 - 東京工業大学横田理央研究室の藤井一喜さん大規模言語モデル開発を支える分散学習技術 - 東京工業大学横田理央研究室の藤井一喜さん
大規模言語モデル開発を支える分散学習技術 - 東京工業大学横田理央研究室の藤井一喜さん
 
W&B monthly meetup#7 Intro.pdf
W&B monthly meetup#7 Intro.pdfW&B monthly meetup#7 Intro.pdf
W&B monthly meetup#7 Intro.pdf
 
20230705 - Optuna Integration (to share).pdf
20230705 - Optuna Integration (to share).pdf20230705 - Optuna Integration (to share).pdf
20230705 - Optuna Integration (to share).pdf
 
W&B Seminar #5(to share).pdf
W&B Seminar #5(to share).pdfW&B Seminar #5(to share).pdf
W&B Seminar #5(to share).pdf
 
makoto shing (stability ai) - image model fine-tuning - wandb_event_230525.pdf
makoto shing (stability ai) - image model fine-tuning - wandb_event_230525.pdfmakoto shing (stability ai) - image model fine-tuning - wandb_event_230525.pdf
makoto shing (stability ai) - image model fine-tuning - wandb_event_230525.pdf
 
LLM Webinar - シバタアキラ to share.pdf
LLM Webinar - シバタアキラ to share.pdfLLM Webinar - シバタアキラ to share.pdf
LLM Webinar - シバタアキラ to share.pdf
 
W&B Seminar #4.pdf
W&B Seminar #4.pdfW&B Seminar #4.pdf
W&B Seminar #4.pdf
 
Kaggle and data science
Kaggle and data scienceKaggle and data science
Kaggle and data science
 
Data x
Data xData x
Data x
 

Analysis Software Benchmark

  • 1. Root analysis and implications to analysis model in ATLAS Akira Shibata, New York University @ ACAT 08 in Erice Nov 05, 2008 1
  • 2. Are we ready to face data from LHC collisions? Grid computing? Do we have enough CPU? Tape? Disks? RAM? Do we need T1? T2? T3? AF? Do we need backdoor access? Are the machines maintained? Is it scary? Are they online? Do we have enough bandwidth? Can we copy data across the world? Can we reach the data we need? Can we reduce the data size? ESD? AOD? D1PD? D2PD? D3PD? Can we download them? Do we need interactive access? How do we write an analysis? How fast do they run? Do we need to buy more disk? How big is my ntuple? Do we need to buy more CPU? Disks? RAM? Are we up to date? Do I look cool if I buy a mac? Is virtual machine useful? Why do we use ROOT? What is PROOF? Is python fast enough? Is it easy to code? How often will I need to process my data? How fast will my analysis run? What can I do to get faster? What are the options? What is the future technology? ACAT - Novebmer 5, 2008 akira.shibata@nyu.edu 2
  • 3. Analysis in the Era of Grid Computing ESD ESD T1 ESD T1 ESD ~500kB/evt Central Root Native AOD/ DPD Root + POOL making Rough size estimate D1PD AOD T2 D1PD AOD T2 AOD cpu ~100kB/evt Grid Get Analy / DPD making T3 D1PD D2PD D3PD T3 D1PD D2PD D3PD 30-80kB/evt 10-50kB/evt 1-10kB/evt request ROOT / ARA deliver Analysis at Institute User User Local Root Histo Desktop Ntuple Histo Desktop Ntuple Analysis ~1kB/evt Tiered model for computing model. Leveled approach needed to optimize the system. Above all, how well does it work from the physicistsʼ point of view? ACAT - Novebmer 5, 2008 akira.shibata@nyu.edu 3
  • 4. Derived Physics Data • DPDs are created using the following operations: • Skimming: selecting the events one needs • Thinning: selecting the objects one needs • Slimming: removing information from objects. • ESDs hold full information from reconstruction. AOD, DnPDs are derived with increasing level of derivation. • Primary purpose of D1PD is to have access to parts of the ESD information that are otherwise difficult to get to. • D1/2PD are in POOL format. D3PD refers to any DPD that are in ntuple format. • ESD, AOD, D1PD contents are defined by groups. Several types of D1PD are defined by performance groups. D2PD and D3PD are defined by users. • First level analysis may be done (variable calculation, object reco etc) when D2/3PD are created. ACAT - Novebmer 5, 2008 akira.shibata@nyu.edu 4
  • 5. Motivation for Profiling ROOT Analysis • The primary use of the Grid is event reconstruction, storage and production of reduced data. This is done using ATLAS software, Athena. Some analysis happens here too. • However, post-Grid (non-Athena) ROOT analysis is the main stage for physics analysis. • Mostly a user-level decision due to the private nature of physics analysis but: • the situation is becoming more complex due to availability of new technology; • no good summary exists comparing the available options; • it is an important ingredient for an efficient analysis model; • it is needed for estimating resource requirements. • Technical discussions does not always answer practical questions. This study will benchmark analysis “modes” in realistic settings based on wall-time measurements. ACAT - Novebmer 5, 2008 akira.shibata@nyu.edu 5
  • 6. “Flat” vs POOL Persistency • Many of the complexity in the current situation is due to the POOL technology (additional layer to the ROOT persistency technology) used in ATLAS. POOL supports: • Metadata lookup - used by TAG to access events in large file without having to read the full contents. • More flexibility in writing out complex objects. Has its own way of T/P separation and schema evolution. • When the decision was made ROOT persistency was not so great as it is now. • Problems writing out STL objects. • Problems referring to objects in different trees/files. • ROOT persistency has improved and now has less issues. • ARA - enabling reading POOL objects from ROOT by calling POOL converters on demand. P->T conversion. Takes extra read time. ACAT - Novebmer 5, 2008 akira.shibata@nyu.edu 6
  • 7. Summary of Existing Analysis Modes Mode Draw CINT ACLiC PyRoot g++ Athena Ntuple POOL Compiled/ Interpreted Interpreted Compiled Interpreted Compiled Both Interpreted C++ C++ Language (C++)-- C++ Python C++ Python Python Interactive Additional MakeClass SFrame - SPyroot - packages MakeSelector AMA Standard - - - dev env Athena components Implemented most common options. All codes available in ATLAS CVS: users/ashibata/RootBenchmark ACAT - Novebmer 5, 2008 akira.shibata@nyu.edu 7
  • 8. Benchmark Analysis Contents • A simple Zee reconstruction analysis implemented for every mode: 1. Access electron container (POOL) / electron kinematics branches (Ntuple) 2. Select electrons using isEM and pt and charge 3. Fill histograms with electron kinematics (pT and multiplicity) 4. Combine electrons to reconstruct Z 5. Fill histogram with Z mass 6. Write histograms out in finalize • Repeated the above 10 times • Not complex enough for a real analysis but not entirely trivial. • For Draw, plot electron after isEM/pt/charge selection. No four vector arithmetics. ACAT - Novebmer 5, 2008 akira.shibata@nyu.edu 8
  • 9. Obtaining Reliable Results • Using POSIX measurement as much as possible. Use wall time from time module. • Avoiding somewhat unstable measurement with TStopwatch. • Measurements affected by other activities on the machine. Overcome by multiple measurements. • Machine: Acas (BNL) node with normal load 3.34GB mem, 2 cores Xeon@ 2.00 GHz, data on NFS. • Disk cache leads to misleading results. CPU time = Wall time once the data is in memory. • Force disc read by flushing RAM. Do not re- read until all other files have been read. Alternate between AOD and ntuple analyses. ACAT - Novebmer 5, 2008 akira.shibata@nyu.edu 9
  • 10. Methodology AOD AOD 1. Measured time taken to Wall time (s) process with increasing Wall time (s) gpp (init:6.64e+01s, rate:5.35e+02Hz) gpp (init:6.64e+01s, rate:5.35e+02Hz) 1600 1600 SFrame (init:3.62e+01s, rate:3.15e+02Hz) SFrame (init:3.62e+01s, rate:3.15e+02Hz) number of events. Draw (init:4.62e+01s, rate:1.25e+02Hz) Draw (init:4.62e+01s, rate:1.25e+02Hz) 2. Repeat measurements and 1400 1400 PyAthena (init:2.74e+01s, rate:9.65e+01Hz) PyAthena (init:2.74e+01s, rate:9.65e+01Hz) take average for each point. Athena (init:3.08e+01s, rate:6.86e+01Hz) 1200 Athena (init:3.08e+01s, rate:6.86e+01Hz) CINT (init:5.25e+01s, rate:1.85e+01Hz) 3. Fit a straight line to obtain overhead (offset) and rate 1200 PyRoot (init:2.50e+00s, rate:1.24e+01Hz) CINT (init:5.25e+01s, rate:1.85e+01Hz) (evt/sec). 1000 PyRoot (init:2.50e+00s, rate:1.24e+01Hz) 4. Calculate errors from 1000 800 standard deviation. Only use rate in comparing 800 600 the modes. Overhead varies between a fraction of seconds 400 to tens of seconds. 600 200 400 0 0 1000020000300004000050000 200 Number of events ACAT - Novebmer 5, 2008 akira.shibata@nyu.edu 0 10
  • 11. Data and Format POOL Ntuple AOD CBNT? Full contents 144.22 kB/evt not tried DPD contents TopD1PD TopD3PD Trigger/Jets/Leptons etc 31.42 kB/evt 4.87 kB/evt Small DPD contents SmallD2PD SmallD3PD Tracks + Electrons 18.74 kB/evt 0.71 kB/evt Very small DPD VerySmallD2PD VerySmallD3PD Electrons 1.06 kB/evt 0.37 kB/evt All derived from FDR2 AODs. All produced on PANDA (except AOD and D1PD). Around 10,000 events per file. Total sample size for one data type ranges between 1 GB - 100 GB. A use-case driven comparison. Input file sizes are different. ACAT - Novebmer 5, 2008 akira.shibata@nyu.edu 11
  • 12. AOD Analysis Results AOD Input AOD (rate, error) mode Compiled non- gpp (535Hz, 3%) framework analysis is the fastest. SFrame (321Hz, 13%) Draw (138Hz, 35%) Only small difference Athana (98Hz, 8%) between C++/Python in Athena. PyAthena (95Hz, 11%) CINT (21Hz, 15%) CINT by far the slowest. TSelector (19Hz, 2%) PyRoot (17Hz, 18%) Seems to be reading all containers in the files 0 200 400 Hz ACAT - Novebmer 5, 2008 akira.shibata@nyu.edu Hz 12
  • 13. D1PD Level Comparison mode (rate, error) Top_D3PD Top Top_D1PD D1PDInput Top D3PD Input ACLiC_Opt (58719Hz, 16%) gpp (1130Hz, 15%) ACLiC (48494Hz, 20%) SFrame (721Hz, 17%) Ntuple/POOL=40.6 gpp (45869Hz, 21%) Athana (313Hz, 6%) TSelector_ACLiC (18551Hz, 18%) Ntuple/POOL=13.1 SFrame (9453Hz, 19%) Draw (298Hz, 55%) Ntuple/POOL=7.9 Draw (2343Hz, 15%) PyAthena (204Hz, 4%) Ntuple/POOL=2.7 Athana (838Hz, 1%) PyRoot (43Hz, 9%) Ntuple/POOL=7.1 PyRoot (300Hz, 21%) Ntuple/POOL=1.2 PyAthena (242Hz, 30%) CINT (26Hz, 6%) Ntuple/POOL=1.8 TSelector (39Hz, 3%) TSelector (22Hz, 2%) Ntuple/POOL=1.2 CINT (32Hz, 2%) 0 500 1000 Hz 0 20000 40000 60000 Hz An order of magnitude advantage for using ntuple for g++ analysis. Much less Hz difference with non-compiled modes. Hz ACAT - Novebmer 5, 2008 akira.shibata@nyu.edu 13
  • 14. D2PD Level Comparison mode (rate, error) Small_D2PD Small_D3PD Small D2PD Input Small D3PD Input gpp (2132Hz, 6%) Ntuple/POOL=33.3 gpp (71003Hz, 7%) ACLiC_Opt (58223Hz, 18%) SFrame (1679Hz, 29%) TSelector_ACLiC (33579Hz, 23%) Athana (596Hz, 5%) Ntuple/POOL=8.7 SFrame (14597Hz, 26%) PyAthena (326Hz, 4%) Ntuple/POOL=21.2 Draw (6358Hz, 17%) Draw (300Hz, 29%) Ntuple/POOL=1.4 Athana (855Hz, 3%) Ntuple/POOL=3.8 PyRoot (382Hz, 22%) PyRoot (100Hz, 10%) Ntuple/POOL=1.1 PyAthena (367Hz, 28%) CINT (29Hz, 4%) Ntuple/POOL=1.7 TSelector (40Hz, 2%) TSelector (23Hz, 1%) Ntuple/POOL=1.1 CINT (32Hz, 1%) 0 1000 2000 Hz 0 20000 40000 Hz 60000 POOL analysis faster than AOD input by x4. Larger difference between Athena Hz and PyAthena with smaller input files. Why? Hz ACAT - Novebmer 5, 2008 akira.shibata@nyu.edu 14
  • 15. Very Small Input Comparison Very_Small_D2PD Very_Small_D3PD Very Small D2PD Input Very Small D3PD Input ACLiC_Opt (63555Hz, 9%) gpp (2798Hz, 5%) Ntuple/POOL=17.3 gpp (48516Hz, 17%) SFrame (2519Hz, 12%) TSelector_ACLiC (34201Hz, 22%) Athana (667Hz, 8%) Ntuple/POOL=5.5 SFrame (13751Hz, 28%) Ntuple/POOL=23.0 Draw (6777Hz, 16%) PyRoot (416Hz, 19%) Ntuple/POOL=1.3 Athana (854Hz, 5%) PyAthena (307Hz, 14%) Ntuple/POOL=1.1 PyAthena (343Hz, 28%) Ntuple/POOL=0.8 PyRoot (331Hz, 25%) Draw (294Hz, 47%) TSelector (40Hz, 1%) CINT (31Hz, 0%) Ntuple/POOL=1.0 CINT (32Hz, 1%) 0 1000 2000 3000 0 20000 40000 60000 Hz D2PD nearing D3PD even more. A few thousand Hz possible with ARA. Ntuple Hz mode still factor of 5-10 faster in C++ modes. Hz ACAT - Novebmer 5, 2008 akira.shibata@nyu.edu 15
  • 16. Event Size * E I/O104 Dependency Comparison POOL Analysis Ntuple Analysis Event Size * Exec Rate (kB/s) Event Size * Exec Rate (kB/s) 105 104 3 10 ACLiC 4 10 gpp PyAthena 3 10 AthAthena Draw TSelector PyRoot Athena 103 ACLiC_Opt 2 PyAthena 10 AthAthena PyRoot Draw PyRoot PyAthena PyAthena CINT ACLiC gpp TSelector_ACLiC PyAthena 2 gpp TSelector 10 Draw gpp 2 ACLiC_Opt AthAthena CINT gpp 10 CINT PyRoot TSelector_ACLiC CINT CINT SFrame AthAthena SFrame PyRoot SFrame SFrame SFrame TSelector TSelector Draw Draw 0 20 40 60 80 100120 140160 0 1 2 3 4 5 0 20 40 60 80 100120 140160 0 20 40 60 80 100120 140160 Event Size (kB) 0 Event Size (kb) 2 analysis coming from file Event 1 (kB) Clear I/O constraint > 20 kB in POOL 3 4 5 Event Size (kB) Event Size (kb) size, NOT read-out size. Ntuples are usually smaller than 20kB. ACAT - Novebmer 5, 2008 akira.shibata@nyu.edu 16
  • 17. Summary • Very clear performance advantage for ROOT native ntuple format. An order of magnitude difference. Ball park figure: Thousands evts/sec vs hundreds of Hz. Those numbers should be taken as upper limit, real analyses would be more complex. • Compiled mode is ~two orders of magnitude faster than non-compiled options. • Use of frameworks, even quite a simple one, can slow things down, though, any realistic analysis would require some infrastructure. Choose/write frameworks wisely! • With Athena, the overhead of framework seems large, though typical DPD jobs can be highly CPU intensive. • Effect of file caching by system ties input file size and the execution rate (regardless of the actual read-out). Above 20 kb/evt, the analysis is bound by this effect. This is a very tight slimming/thinning requirement for D12PD. May be able to improve this with high performance disk. ACAT - Novebmer 5, 2008 akira.shibata@nyu.edu 17
  • 18. Acknowlegement I have bothered a lot of people with this project including (random order): Scott Snyder, Wim Lavrijsen, Sebastien Binet, Emil Obrekov, David Quarrie, Kyle Cranmer, David Adams, Sven Menke, Shuwei Ye, Sergey Panitkin, Stephanie Majeski, Hong Ma, Tadashi Maeno, Attila Krasznahorkay, Jim Cochran, roottalk, Paolo Califiura Many thanks. ACAT - Novebmer 5, 2008 akira.shibata@nyu.edu 18