1. Enabling Grids for E-sciencE
Middleware Overview
Emidio Giorgio
INFN Catania
ISSGC’09,Nice-Sophia Antipolis, 10.07.2009
www.eu-egee.org
EGEE-III INFSO-RI-222667
venerdì 10 luglio 2009
2. Outline
Enabling Grids for E-sciencE
• General overview
• Security System
– VOMS server
– LCAS LCMAPS
• Information Service
– Berkeley DB Information Index (BDII)
• Workload Management System
– WMS mechanism
– JDL
– Computing Element
– Logging and bookkeeping
• Questions
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 2
venerdì 10 luglio 2009
3. Enabling Grids for E-sciencE
gLite Middleware overview
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 3
venerdì 10 luglio 2009
4. EGEE Project and gLite
Enabling Grids for E-sciencE
• Enabling Grids for E-sciencE (EGEE) is a large multi-
disciplinary grid infrastructure
– Brings together more than 120 European organisations
– Consists of ~300 sites in 48 countries and more than 68,000 CPUs
– Is available to some 10,000 users 24 hours a day, 7 days a week
– Processes more than 150,000 jobs per day from different scientific
domains
• gLite is the middleware powering the EGEE infrastructure
and many other related projects
– Is an integrated set of components designed to enable resource
sharing among different institutions
– Pulls together contributions from many other projects, including LCG
and VDT
– Enable users with a large set of services
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 4
venerdì 10 luglio 2009
5. Additional Infrastructures: GILDA
Enabling Grids for E-sciencE
• EGEE provides a training infrastructure: GILDA (Grid
INFN Laboratory for Dissemination Activities)
– Runs the entire gLite stack protocols
– Used to demonstrate EGEE grid technology project
– Supports beginner and expert training courses on gLite
• Adopted by several Grid projects worldwide
• Own Certification Authority
• Available 365 days for everyone !
• Used in the ISSGC schools series
• Since 2007 other middleware than gLite are tested on
GILDA
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 5
venerdì 10 luglio 2009
6. Additional Infrastructures: GILDA
Enabling Grids for E-sciencE
• EGEE provides a training infrastructure: GILDA (Grid
INFN Laboratory for Dissemination Activities)
– Runs the entire gLite stack protocols
– Used to demonstrate EGEE grid technology project
– Supports beginner and expert training courses on gLite
• Adopted by several Grid projects worldwide
• Own Certification Authority
• Available 365 days for everyone !
• Used in the ISSGC schools series
• Since 2007 other middleware than gLite are tested on
GILDA
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 5
venerdì 10 luglio 2009
7. gLite in the Grid “ecosystem”
Enabling Grids for E-sciencE
Condor Globus MyProxy ...
EDG ...
OSG, VDT
… DataTAG
CrossGrid LCG ...
SRM
GridCC NextGrid EGEE DEISA …
interactive USA EU
Used in
Future grids
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 6
venerdì 10 luglio 2009
8. The Middleware structure
Enabling Grids for E-sciencE
• Applications have access both to
Higher-level Grid Services and to
Foundation Grid Middleware
• Higher-Level Grid Services are
supposed to help the users
building their computing
infrastructure but should not be
mandatory
• Foundation Grid Middleware are
actually developed in EGEE
– Must be complete and robust
– Should allow interoperation with other
major grid infrastructures
– Should not assume the use of Higher-
Level Grid Services
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 7
venerdì 10 luglio 2009
9. gLite Services Decomposition
Enabling Grids for E-sciencE
CLI API
Access
Authorization Information & Service
Monitoring Discovering
Auditing
Authentication Network
Monitoring Information &
Security Services Monitoring Services
Metadata File & Replica Job Package
Catalog Catalog Provenance Manager
Accounting
Storage Data Computing Workload
Element Movement Element Management
Data Services Job Mgmt. Services
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 8
venerdì 10 luglio 2009
10. gLite infrastructure
Enabling Grids for E-sciencE
Workload Management System (WMS)
Data Management
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 9
venerdì 10 luglio 2009
11. Enabling Grids for E-sciencE
Security System
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 10
venerdì 10 luglio 2009
12. gLite Security
Enabling Grids for E-sciencE
• Authentication based on X.509 PKI infrastructure
– Certificate Authorities (CA) issue (long lived) certificates
identifying individuals (much like a passport)
– Trust between CAs and sites is established (offline)
– In order to reduce vulnerability, Grid user identification is done by
(short lived) proxies of their certificates
• Proxies can
– Be delegated to a service such that it can act on the user’s
behalf
– Include additional attributes (like VO information via the VO
Membership Service VOMS)
– Be stored in an external proxy store (MyProxy)
– Be renewed (in case they are about to expire)
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 11
venerdì 10 luglio 2009
13. X.509 Proxy Certificate
Enabling Grids for E-sciencE
• Proxy: GSI extension to X.509 Identity Certificates
– signed by the normal end entity cert (or by another proxy).
• It enables single sign-on.
• It supports some important features:
– Delegation, Mutual authentication
• It has a limited lifetime (minimized risk of “compromised
credentials”)
• It is created by the voms-proxy-init command
– Options for voms-proxy-init:
-hours <lifetime of credential>
-bits <length of key>
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 12
venerdì 10 luglio 2009
14. GRID Security: Components
Enabling Grids for E-sciencE
Users
• Large and dynamic population
•Different accounts at different sites “Groups”
•Personal and confidential data
•Heterogeneous privileges (roles) • “Group” data
• Access Patterns
•Desire Single Sign-On
• Membership
Grid
• Heterogeneous Resources
Sites • Access Patterns
• Local policies
• Membership
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 13
venerdì 10 luglio 2009
15. VOMS: concepts
Enabling Grids for E-sciencE
Virtual Organization Membership Service:
– Extends the proxy with info on VO membership, group, roles
– Fully compatible with GSI
– Each VO has a database containing group membership, roles and capabilities
informations for each user
– User contacts VOMS server requesting his authorization info
– Server sends authorization info to the client, which includes it in a proxy certificate
Authentication
VOMS
Request
client
VOMS
AC
Q
ue
ry
VOMS
C=IT/O=INFN AC
/L=CNAF Auth
/CN=Pinco Palla DB
/CN=proxy OK
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 14
venerdì 10 luglio 2009
16. FQAN and AC
Enabling Grids for E-sciencE
• VOMS uses the Fully Qualified Attribute Name (FQAN) to
express membership and other authorization info
• Groups membership, roles and capabilities may be
expressed in a format that bounds them together
– <group>/Role=[<role>][/Capability=<capability>]
• FQAN are included in an Attribute Certificate
• Attribute Certificates are used to bind a set of attributes
(like membership, roles, authorization info etc) with an
identity
• ACs are digitally signed
• VOMS uses AC to include the attributes of a user in a
proxy certificate
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 15
venerdì 10 luglio 2009
17. VOMS Certificate
Enabling Grids for E-sciencE
• AC is included by the client in a well-defined, non
critical, extension assuring compatibility with GT-based
mechanism
asli@levrek:~$ voms-proxy-init --voms gilda
Your identity: /C=IT/O=GILDA/OU=Personal Certificate/L=INFN/CN=Marco
Fargetta/Email=Marco.Fargetta@ct.infn.it
Enter GRID pass phrase:
Creating temporary proxy .................................... Done
Contacting voms.ct.infn.it:15001 [/C=IT/O=INFN/OU=Host/L=Catania/
CN=voms.ct.infn.it] "gilda" Done
Creating proxy .................................. Done
Your proxy is valid until Tue Jun 26 03:16
asli@levrek:~$
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 16
venerdì 10 luglio 2009
18. VOMS Certificate
Enabling Grids for E-sciencE
• AC is included by the client in a well-defined, non
critical, extension assuring compatibility with GT-based
mechanism
asli@levrek:~$ voms-proxy-init --voms gilda
Your identity: /C=IT/O=GILDA/OU=Personal Certificate/L=INFN/CN=Marco
Fargetta/Email=Marco.Fargetta@ct.infn.it
Enter GRID pass phrase:
Creating temporary proxy .................................... Done
Contacting voms.ct.infn.it:15001 [/C=IT/O=INFN/OU=Host/L=Catania/
CN=voms.ct.infn.it] "gilda" Done
Creating proxy .................................. Done
Your proxy is valid until Tue Jun 26 03:16
asli@levrek:~$
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 16
venerdì 10 luglio 2009
20. LCAS & LCMAPS
Enabling Grids for E-sciencE
• At resources level, authorization info is extracted from
the proxy and processed by LCAS and LCMAPS
• Local Centre Authorization Service (LCAS)
– Checks if the user is authorized
– Checks if the user is banned at the site
• Local Credential Mapping Service (LCMAPS)
– Map remote credentials to local credentials (eg. different UNIX
uid/gid)
– Map also VOMS group and roles (full support of FQAN)
enables privileges separations
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 18
venerdì 10 luglio 2009
21. VOMS enabled Grid
Enabling Grids for E-sciencE
• User can be in multiple VOs
– Aggregate rights
• VO can have groups
– Different rights for each
Different groups of experimentalists
…
– Nested groups
• VO has roles
– Assigned to specific purposes
E,g. system admin
When assume this role
• Proxy certificate carries the additional attributes
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 19
venerdì 10 luglio 2009
22. Enabling Grids for E-sciencE
Information Service
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 20
venerdì 10 luglio 2009
23. Information Service
Enabling Grids for E-sciencE
• What?
– System to collect information on the state of resources
• Why?
– To discover resources of the grid and their nature
– To check for health status of resources
– To provide data in order to manage the workload more efficiently
• How?
– Monitoring and publishing fresh data on the state of resources
– Adopting a well known data model
• Who?
– User searching specific resources for their activity
– Workload Management System
– Other monitoring system
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 21
venerdì 10 luglio 2009
24. Information Service Systems
Enabling Grids for E-sciencE
• The gLite Data Model is based on Grid Laboratory
Uniform Environment (GLUE) Schema
• The IS architecture used in gLite is Berkeley DB
Information Index (BDII)
– has been adopted in LCG middleware as the Information System
provider
– It is an evolution of the Globus Meta Directory System (MDS)
– It is based on Lightweight Directory Access Protocol (LDAP)
servers
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 22
venerdì 10 luglio 2009
25. GLUE Schema
Enabling Grids for E-sciencE
• Describe the Grid resources information stored in the IS
• Independent from the underlying technology
• Actual release is mapped on
– LDAP
– XML
– ClassAd (Condor Matchmaking language)
• The entities of the GLUE Schema are organised
hierarchically
– Include the concept of Site, Cluster, Computing Element, Storage
Element, and an abstraction of service
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 23
venerdì 10 luglio 2009
26. GLUE Schema Structure
Enabling Grids for E-sciencE
Site Cluster Host
Collection of Set of Contains details of
resources owned by * heterogeneous hardware (features
a single organisation. resources. Contains and performance)
Contains info on the info on shared and software
location, the directory
administrator, web 1
page and so on 1
1 1 1 VOview
Sub-Cluster
*
* Set of Job * State
Service homogeneous
resources. Contains
Description of the size of the set
deployed service Info Policy
*
* StorageElement ComputingElement
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 24
venerdì 10 luglio 2009
27. GRISs, local BDII and BDII
Enabling Grids for E-sciencE
Abbreviations:
BDII: Berkeley DataBase
Information Index
GIIS: Grid Index Information
Server Each site
GRIS: Grid Resource Information can run
Server a BDII. It
collects the information
given by the local BDIIs
At each site, a *local* BDII collects the
information
given by the GRISs
Local GRISes run on CEs and SEs at each site and report
dynamic and static information
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 25
venerdì 10 luglio 2009
28. The IS in gLite
Enabling Grids for E-sciencE
BDII-A BDII-B BDII-C
CE Site BDII CE Site BDII
CE Site BDII
CE CE
Local CE Local
GRIS Local GRIS
GRIS
SE SE CE
Local Local Local
GRIS GRIS GRIS
SE
CE Local
GRIS SE RB
Local Local
Site 1 GRIS
Site 2
Local
GRIS
Site 3
GRIS
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 26
venerdì 10 luglio 2009
29. BDII
Enabling Grids for E-sciencE
• Users and other Grid services (such as the WMS) can
interrogate BDIIs to get information about the Grid
status.
• Each BDII collects information from the site GIISes (or
local BDII) defined in a configuration file, which it
accessed through a web interface.
• Every two minutes a cron-job runs a script and collects
information (pull model) from all the GIIS (local BDII)
listed in the configuration file
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 27
venerdì 10 luglio 2009
30. Summary
Enabling Grids for E-sciencE
• The security system of gLite is based on X.509
certificates
– Users are identified by certificates
– VOMS server link user to VOs, groups and roles adding
attributes to the proxy certificate
– LCAS and LCMAPS control the local access to the resources
checking the user certificates
• Information System provided by gLite is the BDII
– The information are organised following the GLUE Schema
– Current implementation use only BDII to check the state of the
resources
The user can contact the top BDII in the hierarchy to get the
information of all the resources
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 28
venerdì 10 luglio 2009
31. Enabling Grids for E-sciencE
gLite Workload Management
System
www.eu-egee.org
EGEE-III INFSO-RI-222667
venerdì 10 luglio 2009
32. Outline
Enabling Grids for E-sciencE
gLite Overview
Workload Management System
WMS Architecture
Job state machine
Job Description Language Overview
Security overview
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 30
venerdì 10 luglio 2009
33. gLite services
Enabling Grids for E-sciencE
User Interface Workload Management Information System
Logging & Bookkeeping
submit query
retrieve discover
services
update
publish
credential
state
submit publish
query
retrieve state
File and Replica
Catalogs Site X
Computing Storage
Element Element
Authorization
Service
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 31
venerdì 10 luglio 2009
34. WMS Objectives
Enabling Grids for E-sciencE
The Workload Management System (WMS) comprises a set
of Grid middleware components responsible for distribution
and management of tasks across Grid resources.
The purpose of the Workload Manager (WM) is to accept and
satisfy requests for job management coming from its clients
meaning of the submission request is to pass the responsibility of the
job to the WM.
WM will pass the job to an appropriate CE for execution
taking into account requirements and the preferences expressed in the job
description file
The decision of which resource should be used is the outcome
of a matchmaking process.
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 32
venerdì 10 luglio 2009
35. Enabling Grids for E-sciencE
WMS Architecture
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 33
venerdì 10 luglio 2009
36. Enabling Grids for E-sciencE
WMS Architecture
Job management
requests (submission,
cancellation) expressed
via a Job Description
Language (JDL)
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 33
venerdì 10 luglio 2009
37. Enabling Grids for E-sciencE
WMS Architecture
Keeps submission
requests
Requests are kept
for a while
if no resources are
immediately available
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 33
venerdì 10 luglio 2009
38. Enabling Grids for E-sciencE
WMS Architecture
Finds an appropriate
CE for each submission
request, taking into account
job requests and preferences,
Grid status, utilization policies
on resources
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 33
venerdì 10 luglio 2009
39. Enabling Grids for E-sciencE
WMS Architecture
Repository of resource
information
available to matchmaker
Updated via notifications
and/or active
polling on resources
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 33
venerdì 10 luglio 2009
40. Enabling Grids for E-sciencE
WMS Architecture
Performs the actual
job submission
and monitoring
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 33
venerdì 10 luglio 2009
41. Enabling Grids for E-sciencE
WMS Architecture
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 33
venerdì 10 luglio 2009
42. Enabling Grids for E-sciencE
Job Description Language
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 34
venerdì 10 luglio 2009
43. Job Description Language
Enabling Grids for E-sciencE
In gLite, Job Description Language (JDL) is used to describe
jobs for execution on Grid.
The JDL adopted within the gLite middleware is based upon
Condor’s CLASSified Advertisement language (ClassAd).
A ClassAd is a record-like structure composed of a finite number of
attributes separated by semi-colon (;)
A ClassAd is highly flexible and can be used to represent arbitrary
services
The JDL is used in gLite to specify the job’s characteristics
and constrains, which are used during the match-making
process to select the best resources that satisfy job’s
requirements.
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 35
venerdì 10 luglio 2009
44. Job Description Language (cont.)
Enabling Grids for E-sciencE
The JDL syntax consists on statements like:
Attribute = value;
Comments must be preceded by a sharp character
( # ) or have to follow the C++ syntax
WARNING: The JDL is sensitive to blank
characters and tabs. No blank characters
or tabs should follow the
semicolon at the end of a line.
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 36
venerdì 10 luglio 2009
55. Other relevant JDL attributes
Enabling Grids for E-sciencE
• If your job needs a file stored somewhere, you can
specify its LFN :
• The file will not be copied but your job scheduled to a CE
near the SE holding that file
• That is crucial when dealing with large files
DataRequirements = {
[
InputData = {"lfn:/grid/gilda/emidio/test.txt"};
DataCatalogType = "DLI";
DataCatalog = "http://lfc-gilda.ct.infn.it:8085";
]
};
DataAccessProtocol = {"rfio","gsiftp"};
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 38
venerdì 10 luglio 2009
56. Othere relevant JDL attributes
Enabling Grids for E-sciencE
• Rank : allows to override UI’s default for fitness function on
which resources are classified
Rank = ( other.GlueCEInfoTotalCPUs);
Rank = ( other.GlueCEStateWaitingJobs == 0 ?
other.GlueCEStateFreeCPUs : -
other.GlueCEStateWaitingJobs);
• RetryCount : override default for times that a job will be
resubmitted after the first failure
RetryCount = 7
• Requirements : a wide set of attributes, as they are published
from the BDII, can be required. Regular expressions can be
even set, and/or combined with logical operators ( II, &&, ! )
Requirements =
(RegExp("pd.infn.it",other.GlueCEUniqueID));
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 39
venerdì 10 luglio 2009
57. Workflows of jobs
Enabling Grids for E-sciencE
• With a single request, multiple jobs
can be generated and executed nodeA
• Direct Acyclic Graph (DAG) is a set of
jobs where the input, output, or nodeB nodeC nodeE
execution of one or more jobs
depends on one or more other jobs
• A Collection is a group of jobs with nodeD
no dependencies
– basically a collection of JDL’s
• A Parametric job is a job having one or more attributes in the JDL that
vary their values according to parameters
• Using compound jobs it is possible to have one shot submission of a
(possibly very large, up to thousands) group of jobs
– Submission time reduction
• Single call to WMProxy server / single Authentication and Authorization process
• Sharing of files between jobs
– Availability of both a single Job Id to manage the group as a whole and Job Ids for
each single job in the group
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 40
venerdì 10 luglio 2009
59. DAG example
Enabling Grids for E-sciencE
[ Type = "dag";
InputSandbox = {"son.sh"};
nodes = [
n
issio
son1 = [
description = [ bm
JobType = "Normal"; l e Su job id
Executable = "/bin/sh"; Sing ingle
InputSandbox = {root.InputSandbox}; s
Arguments = "son.sh 1";
StdOutput = "son1.output";
StdError = "son1.error";
OutputSandbox = {"final1.input","son1.output","son1.error"};
];
];
final = [
description = [
JobType = "Normal";
Executable = "/bin/sh";
InputSandbox = {"final.sh", root.nodes.son1.description.OutputSandbox[0]};
Arguments = "final.sh";
StdOutput = "dag.out";
StdError = "dag.err";
OutputSandbox = {"dag.out","dag.err"};
];
];
dependencies = { {son1,final}};
];
]
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 41
venerdì 10 luglio 2009
60. Enabling Grids for E-sciencE
[issgc59@issgc-ui ~]$ glite-wms-job-submit -d emidio -o jobid-file sfk-explorer.jdl
Connecting to the service https://gilda-wms-01.ct.infn.it:7443/glite_wms_wmproxy_server
====================== glite-wms-job-submit Success ======================
The job has been successfully submitted to the WMProxy
Your job identifier is:
https://gilda-lb-01.ct.infn.it:9000/4OaQng0PdA1nZJZHMcilqA
The job identifier has been saved in the following file:
/home/issgc59/jid
=====================================================================
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 42
venerdì 10 luglio 2009
61. Jobs State Machine (1/9)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 43
venerdì 10 luglio 2009
62. Jobs State Machine (1/9)
Enabling Grids for E-sciencE
Submitted job is entered by the user
to the User Interface but not yet
transferred to Network Server for
processing
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 43
venerdì 10 luglio 2009
63. Jobs State Machine (2/9)
Enabling Grids for E-sciencE
Waiting job accepted by WMS
and waiting for Workload
Manager processing or being
processed by WMHelper
modules.
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 44
venerdì 10 luglio 2009
64. Jobs State Machine (3/9)
Enabling Grids for E-sciencE
Ready job processed by WM
but not yet transferred to the
CE (local batch system
queue).
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 45
venerdì 10 luglio 2009
65. Jobs State Machine (4/9)
Enabling Grids for E-sciencE
Scheduled job waiting in
the queue on the CE.
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 46
venerdì 10 luglio 2009
66. Jobs State Machine (5/9)
Enabling Grids for E-sciencE
Running
job is
running.
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 47
venerdì 10 luglio 2009
67. Enabling Grids for E-sciencE
Jobs State Machine (6/9)
Done job exited or considered to be
in a terminal state by CondorC
(e.g., submission to CE has failed in
an unrecoverable way).
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 48
venerdì 10 luglio 2009
68. Jobs State Machine (7/9)
Enabling Grids for E-sciencE
Aborted job processing was
aborted by WMS (waiting in
the WM queue or CE for too
long, expiration of user
credentials).
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 49
venerdì 10 luglio 2009
69. Jobs State Machine (8/9)
Enabling Grids for E-sciencE
Cancelled job has
been successfully
canceled on user
request.
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 50
venerdì 10 luglio 2009
70. Jobs State Machine (9/9)
Enabling Grids for E-sciencE
Cleared output sandbox was transferred to
the user or removed due to the timeout.
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 51
venerdì 10 luglio 2009
71. Logging and Bookkeping
Enabling Grids for E-sciencE
• Every step of the job life cycle is logged on a service
called Logging and Bookkeeping
• It is useful for users willing to know the status of their
execution
– when a job is submitted the UI logs it on LB
– As result of submission a job identifier is returned
– WMS logs each step of scheduling
– CE logs when it receive a job (scheduled), when it’s running and
when it’s done
– Users can query the job status to the LB providing the job id
• Asynchronous updates....
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 52
venerdì 10 luglio 2009
72. Logging and Bookkeping
Enabling Grids for E-sciencE
• Every step of the job life cycle is logged on a service
called Logging and Bookkeeping
• It is useful for users willing to know the status of their
execution
– when a job is submitted the UI logs it on LB
– As result of submission a job identifier is returned
https://gilda-lb-01.ct.infn.it:9000/fw4Ua8b_7Z8Vd8oJC74NCw
– WMS logs each step of scheduling
– CE logs when it receive a job (scheduled), when it’s running and
when it’s done
– Users can query the job status to the LB providing the job id
• Asynchronous updates....
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 52
venerdì 10 luglio 2009
73. The Computing Element
Enabling Grids for E-sciencE
• The CE is the front-end machine (master node) to a
local batch system
– supported batch systems are PBS(Torque/MAUI), LSF, Condor
• WMS “pushes” job execution requests to the CE using
condor-G
– when a CE receives a job, this is moved on a queue
– Then the job will be executed on the first available among its
Worker Nodes (where the batch system clients run)
– when execution is complete, output files are copied to the CE
using scp
• If the job is succesfully executed, output files are
copied back to the WMS using globus-url-copy
• By queries to the LB, users knows when a job is done
and they can retrieve the output
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 53
venerdì 10 luglio 2009
74. Summary
Enabling Grids for E-sciencE
• WMS catchs users’ request for job executions
• Requests are expressed through JDL
– JDL allows to specify requirements that selected resources must
have
• The WMS processes request and chooses
(matchmaking) a Computing Element for the actual
execution
– Status of resources is known to WMS with queries to BDII
• The CE tries to execute the job and copies back output
files to WMS
– status of execution is logged on LB
• Users queries LB, discovers their job is done and
download output files from WMS
EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 54
venerdì 10 luglio 2009