Marco Cattaneo "Event data processing in LHCb"

Event Data
Processing in LHCb

Marco Cattaneo
CERN – LHCb
On behalf of the LHCb Computing Group

LHC interactions

LHC: Two proton beams of ~1380 bunches, rotating at 11kHz!
!15 MHz crossing rate (30 MHz in 2015)!
!Average 1.5 interactions per crossing in LHCb!
! !Each crossing contains a potentially interesting “event”!

2

Typical event in LHCb

Raw event size ~60 kB
3

Data reduction in real time (trigger)

pp collisions

15 MHz ü  custom hardware
Level-0

ü  partial detector information

ü  CPU
farm -> software trigger
1 MHz
ü  full detector information
HLT

ü  reconstruction in real time
ü (~20k jobs in parallel)
~5 kHz ü  ~200 independent “lines”
ü  300 MB/s
Ofﬂine Storage

ü 1.3 PB in 2012

4

Event Reconstruction

❍  On full event sample (~2 billion events expected in 2012):
❏  Pattern recognition to measure particle trajectories
✰  Identify vertices, measure momentum
❏  Particle ID to measure particle types

~ 2 sec/event
~ 5k concurrent jobs
(run on grid)
Reconstructed data
stored in “FULL DST”

DST event size ~100 kB -> 2 PB in 2012 (includes RAW)
5

Data reduction offline (stripping)

❍  Any given physics analysis selects 0.1-1.0% of events
❏  Inefficient to allow individual physicists to run selection job
over FULL DST
❍  Group all selections in a single selection pass, executed by
central production team
❏  Runs over FULL DST
❏  Executes ~800 independent stripping “lines”
✰  ~ 0.5 sec/event in total
✰  Writes out only events selected by one or more of these lines
❏  Output events are grouped into ~15 streams
❏  Each stream selects 1-10% of events
❍  Overall data volume reduction: x50 – x500 depending on
stream
❏  Few TB (<50) per stream, replicated at several places
❏  Accessible to physicists for data analysis

6

Simulation

LHCb Monte Carlo simulation software:

–  Simulation of physics event

–  Detailed detector and material description (GEANT)

–  Pattern recognition, trigger simulation and ofﬂine event selection

–  Implements detector inefﬁciencies, noise hits, effects of multiple collisions

7

Resources for simulation

❍  Simulation jobs require 1-2 minutes per event
❏  Several billion events per year required
❏  Runs on ~20k CPUs worldwide

Yandex contribution ~25%

8

Physics Applications Software Organization

High level triggers

Reconstruction

Simulation
Applications built on top of

Analysis
frameworks and implementing the
required algorithms.

One framework for basic services
+ various specialized frameworks:
detector description,
visualization, persistency, Frameworks
interactivity, simulation, etc.
Toolkits

A series of widely used basic
libraries: Boost, GSL, Root etc. Foundation Libraries

Gaudi Framework (Object Diagram)

Application Converter
Event Converter
Manager Converter
Selector

Message Event Data Persistency Data
Service Service Service Files
Transient
JobOptions Event
Service Algorithm Store
Algorithm
Algorithm
Transient
Particle Prop. Detec. Data Persistency Data
Detector
Service Service Service Files
Store
Other
Services Transient
Histogram Histogram Persistency Data
Service Store Service Files

The LHCb Computing Model

11

LHCb Workload management System :DIRAC !
User Community!

❍  DIRAC forms an 
overlay network!
❏  A way for grid 
interoperability for  Grid B!
Grid A!
a given Community! (WLCG)! (NDG)!

❏  Needs speciﬁc Agent  
Director per resource type!
❍  From the user perspective 
all the resources are seen as a single large “batch system”!

DIRAC WMS!

u  Jobs are submitted to the
DIRAC Central Task Queue !
u  VRC policies are applied here by
prioritizing jobs in the Queue!

u  Pilot Jobs are submitted by 
specific Directors to various
Grids or computer clusters!
u  Allows to aggregate various
computing types resources
transparently for the users !

u  The Pilot Job gets the most
appropriate user job!
u  Jobs are running in a verified
environment with a high efficiency!
!
13

DIRAC

❍  Live DIRAC Display

14

Data replication

RAW! CERN +

1 Tier1

❍  Active data placement

Brunel
!
(recons)! ❍  Split disk (for analysis) and
archive
❏  No disk replica for RAW and
FULL 1 Tier1

FULL DST (read few times)
DST!

DaVinci!
(stripping DST1!
DST1 ! 1 Tier1

and DST1 ! DST1! DST1!
(scratch)

streaming) !

Merge!
CERN +

1 Tier1

DST1!
CERN +

3 Tier1s

15

Datasets

❍  Granularity at the file level
❏  Data Management operations (replicate, remove replica,
delete file)
❏  Workload Management: input/output files of jobs
❍  LHCbDirac perspective
❏  DMS and WMS use Logical File Names to reference files
❏  LFN namespace refers to the origin of the file
✰  Constructed by the jobs (uses production and job number)
✰  Hierarchical namespace for convenience
✰  Used to define file class (tape-sets) for RAW, FULL.DST, DST
✰  GUID used for internal navigation between files (Gaudi)
❍  User perspective
❏  File is part of a dataset (consistent for physics analysis)
❏  Dataset: specific conditions of data, processing version and
processing level
✰  Files in a dataset should be exclusive and consistent in quality
and content

16

Replica Catalog (1)

❍  Logical namespace
❏  Reflects somewhat the origin of the file (run number for
RAW, production number for output files of jobs)
❏  File type also explicit in the directory tree
❍  Storage Elements
❏  Essential component in the DIRAC DMS
❏  Logical SEs: several DIRAC SEs can physically use the same
hardware SE (same instance, same SRM space)
❏  Described in the DIRAC configuration
✰  Protocol, endpoint, port, SAPath, Web Service URL
✰  Allows autonomous construction of the SURL
✰  SURL = srm:<endPoint>:<port><WSUrl><SAPath><LFN>
❏  SRM spaces at Tier1s
✰  Used to have as many SRM spaces as DIRAC SEs, now only 3
✰  LHCb-Tape (T1D0) custodial storage
✰  LHCb-Disk (T0D1) fast disk access
✰  LHCb-User (T0D1) fast disk access for user data

17

Replica Catalog (2)

❍  Currently using the LFC
❏  Master write service at CERN
❏  Replication using Oracle streams to Tier1s
❏  Read-only instances at CERN and Tier1s
✰  Mostly for redundancy, no need for scaling
❍  LFC information:
❏  Metadata of the file
❏  Replicas
✰  Use “host name” field for the DIRAC SE name
✰  Store SURL of creation for convenience (not used)
❄  Allows lcg-util commands to work

❏  Quality flag
✰  One character comment used to set temporarily a replica as
unavailable
❍  Testing scalability of the DIRAC file catalog
❏  Built-in storage usage capabilities (per directory)

18

Bookkeeping Catalog (1)

❍  User selection criteria
❏  Origin of the data (real or MC, year of reference)
✰  LHCb/Collision12
❏  Conditions for data taking of simulation (energy, magnetic
field, detector configuration…
✰  Beam4000GeV-VeloClosed-MagDown
❏  Processing Pass is the level of processing (reconstruction,
stripping…) including compatibility version
✰  Reco13/Stripping19
❏  Event Type is mostly useful for simulation, single value for
real data
✰  8 digit numeric code (12345678, 90000000)
❏  File Type defines which type of output files the user wants
to get for a given processing pass (e.g. which stream)
✰  RAW, SDST, BHADRON.DST (for a streamed file)
❍  Bookkeeping search
❏  Using a path
✰  /<origin>/<conditions>/<processing pass>/<event type>/<file type>!

19

Bookkeeping Catalog (2)

❍  Much more than a dataset catalog!
❍  Full provenance of files and jobs
❏  Files are input of processing steps (“jobs”) that produce files
❏  All files ever created are recorded, each processing step as
well
✰  Full information on the “job” (location, CPU, wall clock time…)
❍  BK relational database
❏  Two main tables: “files” and “jobs”
❏  Jobs belong to a “production”
❏  “Productions” belong to a “processing pass”, with a given
“origin” and “condition”
❏  Highly optimized search for files, as well as summaries
❍  Quality flags
❏  Files are immutable, but can have a mutable quality flag
❏  Files have a flag indicating whether they have a replica or
not

20

Bookkeeping browsing

❍  Allows to save datasets
❏  Filter, selection
❏  Plain list of files
❏  Gaudi configuration file
❍  Can return files with only replica
at a given location

21

Event indexing

❍  Book-keeping has no
104248:539058326 information about individual
events
Advanced search

Select all Download Selected

104248:539058326

But can be beneficial to
Event time Oct. 27, 2011, 9:11 p.m.

File names
❍ 
/lhcb/LHCb/Collision11/DIMUON.DST/00013016/0000/00013016_00000037_1.dimuon.dst
/lhcb/LHCb/Collision11/EW.DST/00013017/0000/00013017_00000033_1.ew.dst
select events based on global
Application

Tags
Brunel v41r1

DDDB
DQFLAGS
head-20110914
tt-20110126
criteria:
Number of tracks, clusters
LHCBCOND head-20111111
ONLINE HEAD
❏ 

etc.
Stripping lines

DY2MuMuLine2_Hlt 1
FullDSTDiMuonDiMuonHighMassLine 1

Trigger or Stripping lines
StreamDimuon 0
StreamEW 0
WMuLine 2 ❏ 

fired
Z02MuMuLine 1
Z02MuMuNoPIDsLine 1
Z02TauTauLine 2

…
Global Event Activity counters

nBackTracks 37 nPV 2 ❏ 
nDownstreamTracks 21 nRich1Hits 944
nITClusters 352 nRich2Hits 985
nLongTracks 11 nSPDhits 128
nMuonCoordsS0 156 nTTClusters 357
nMuonCoordsS1 86 nTTracks 14

Prototype implemented by
nMuonCoordsS2 15 nTracks 92
nMuonCoordsS3 31 nUpstreamTracks 6
nMuonCoordsS4
nMuonTracks
18
9
nVeloClusters
nVeloTracks
1389
56
❍ 

Andrey Ustyuzhanin
nOTClusters 4615

Search is supported by

22

Staging: using files from tape

❍  If jobs use files that are not online (on disk)
❏  Before submitting the job
❏  Stage the file from tape, and pin it on cache
❍  Stager agent
❏  Performs also cache management
❏  Throttle staging requests depending on the cache size and
the amount of pinned data
❏  Requires fine tuning (pinning and cache size)
✰  Caching architecture highly site dependent
✰  No publication of cache sizes (except Castor and StoRM)
❍  Jobs using staged files
❏  Check first the file is still staged
✰  If not reschedule the job
❏  Copies the file locally on the WN whenever possible
✰  Space is released faster
✰  More reliable access for very long jobs (reconstruction) or jobs
using many files (merging)

23

Data Management Outlook

❍  Improvements on staging
❏  Improve the tuning of cache settings
✰  Depends on how caches are used by sites
❏  Pinning/unpinning
✰  Difficult if files are used by more than one job
❍  Popularity
❏  Record dataset usage
✰  Reported by jobs: number of files used in a given dataset
✰  Account number of files used per dataset per day/week
❏  Assess dataset popularity
✰  Relate usage to dataset size
❏  Take decisions on the number of online replicas
✰  Taking into account available space
✰  Taking into account expected need in the coming week

24

Longer term challenges
❍  Computing model evolution
❏  Networking performance allows evolution from static “Tier”
model of data access to much more dynamic model
❏  Current processing model of 1 sequential job per file per CPU
breaks down in multi-core era due to memory limitations
✰  parallelisation of event processing
❏  Whole node scheduling, virtualisation, clouds
❍  LHCb upgrade in 2018
❏  From computing point of view:
✰  x40 readout rate into HLT farm
✰  x4 event rate to storage (20kHz)
✰  x1.5 event size (100kB/RAW event)
❏  x10 increase in data rates imply
✰  scaling of Dirac WMS and DMS
✰  scaling of data selection catalogs
✰  new ideas for data mining?
❍  Data preservation and open access
❏  Preserve ability to analyse old data many year in the future
❏  Make data available for analysis outside LHCb + general public
25

Marco Cattaneo "Event data processing in LHCb"

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Marco Cattaneo "Event data processing in LHCb"

Similar a Marco Cattaneo "Event data processing in LHCb" (20)

Más de Yandex

Más de Yandex (20)

Último

Último (20)

Marco Cattaneo "Event data processing in LHCb"