SlideShare una empresa de Scribd logo
1 de 40
Introduction to Workflows, APIs and
             Semantics

            Session 37. July 13th, 2009


               Oscar Corcho
     (Universidad Politécnica de Madrid)

Based on slides from all the presenters in the following two days

                                Work distributed under the license Creative Commons
                                     Attribution-Noncommercial-Share Alike 3.0
Themes of the Second Week



Date          Theme                                   Technology
Mon 13 July How to solve my problem?
Tue 14 July   Higher level APIs: OGSA-DAI, SAGA and   SAGA,
              metadata management                     OGSA-DAI,
                                                      Grid SAM
Wed 15 July Workflows                                 P-GRADE,
                                                      Semantic
                                                      Metadata
Thu 16 July   Integrating Practical                   All
Fri 17 July   Cloud Computing (lecture)
Principles of job     Principles of high-     Principles of            Principles of
submission and            throughput        service-oriented         distributed data
   execution              computing           architecture            management
 management




          Principles of using     Higher level APIs:           Workflows
           distributed and        OGSA-DAI, SAGA
          high performance          and metadata
               systems              management
Motivation
•   Grids are:
     – Dynamic:
        • Version, updates, new resources...
     – Heterogenous:
        • Operating Systems, Libraries, software stack
        • Middleware service versions and semantics
        • Administrative policies – access, usage, upgrade
     – Complex:
        • Production level service with high QoS non-trivial
        • Derived from above as well as inherently


•   As described by Steven this morning, operating Grids is still an
    effort-consuming task, and it is still somehow difficult to develop,
    program & deploy Grid applications using the existing Grid
    middleware
•   But as you have also seen during last week (and in Morris’
    presentation today), there are many commonalities among
    heterogeneous middleware
In general…
•   As described by Steven this morning, operating Grids is still an
    effort-consuming task, and it is still somehow difficult to develop,
    program & deploy Grid applications using the existing Grid
    middleware
•   But as you have also seen during last week (and in Morris’
    presentation today), there are many commonalities among
    heterogeneous middleware

•   There is a need of:
     – Programmatic approaches that provide common grid functionality at a
       correct level of abstraction for applications
     – Ability to hide underlying complexity of infrastructure, varying semantics,
       heterogeneity and changes from the application-developer
     – Improved data access and integration mechanisms
     – Traceable, repeatable analyses of e-Science experiments
     – Graphical modelling languages for the ease of Grid application
       development
e-Science Approach Interoperability
    •   Increasing complexity of e-science applications that embrace
        multiple physical models (i.e. multi-physics) & larger scale
          – Creating a steadily growing demand of compute power
          – Demand for a ‘United Federation of world-wide Grids’




                                                 III. Complex                           IV. Interactive
                                                  Workflows                                 Access
       II. Scientific
    Application plug-ins
                                                                                                                 V. Interoperability

                                                      Grid Middleware

I. Simple Scripts &                                           Grid
      Control                                                                                                      other Grid
                                                                                                                      type
                                 [2] Morris Riedel et al., ‘Classification of Different Approaches for
                                                                                                       Balatonfüred, Hungary, 6th-18th July 2008
         e-Science Applications in Next Generation Infrastructures, Int. Conference on e-Science 2008, Indianapolis, Indiana
SAGA one-slide summary
•   Simple API for Grid Application – SAGA
     – Provide simple and usable programmatic interface that can be widely-adopted for
       the development of applications for the grid
     – Simplicity (80:20 restricted scope)
         • easy to use, install, administer and maintain
     – Uniformity
         • provides support for different application programming languages as well as
            consistent semantics and style for different Grid functionality
     – Scalability
         • Contains mechanisms for the same application (source) code to run on a
            variety of systems ranging from laptops to HPC resources
     – Genericity
         • adds support for different grid middleware
     – Modularity
         • provides a framework that is easily extendable
•   SAGA is not…
     – Middleware
     – Service management interface
     – Does not hide the resources - remote files, job (but the details)
Example: SAGA Job submission




  Text
Example: Copy a File (Globus)
int copy_file (char const* source,     char const* target)            if (source_url.scheme_type == GLOBUS_URL_SCHEME_GSIFTP ||
{                                                                         source_url.scheme_type == GLOBUS_URL_SCHEME_FTP    ) {
globus_url_t                         source_url;                        globus_ftp_client_operationattr_init (&source_ftp_attr);
globus_io_handle_t                   dest_io_handle;                    globus_gass_copy_attr_set_ftp (&source_gass_copy_attr,
globus_ftp_client_operationattr_t    source_ftp_attr;                                                   &source_ftp_attr);
globus_result_t                      result;                          }
globus_gass_transfer_requestattr_t   source_gass_attr;                else {
globus_gass_copy_attr_t              source_gass_copy_attr;             globus_gass_transfer_requestattr_init (&source_gass_attr,
globus_gass_copy_handle_t            gass_copy_handle;                                                   source_url.scheme);
globus_gass_copy_handleattr_t        gass_copy_handleattr;              globus_gass_copy_attr_set_gass(&source_gass_copy_attr,
                                                                                    &source_gass_attr);
globus_ftp_client_handleattr_t       ftp_handleattr;                  }

globus_io_attr_t                     io_attr;                         output_file = globus_libc_open ((char*) target,
                                                                                    O_WRONLY | O_TRUNC | O_CREAT,
int                                  output_file = -1;
                                                                                    S_IRUSR | S_IWUSR | S_IRGRP |
                                                                                    S_IWGRP);
                                                                      if ( output_file == -1 ) {
if ( globus_url_parse (source_URL, &source_url) != GLOBUS_SUCCESS )     printf ("could not open the file "%s"n", target);
{                                                                       return (-1);
  printf ("can not parse source_URL "%s"n", source_URL);           }
                                                                      /* convert stdout to be a globus_io_handle */
    return (-1);                                                      if ( globus_io_file_posix_convert (output_file, 0,
                                                                                                        &dest_io_handle)
}                                                                          != GLOBUS_SUCCESS) {
                                                                        printf ("Error converting the file handlen");
                                                                        return (-1);
if ( source_url.scheme_type != GLOBUS_URL_SCHEME_GSIFTP &&            }

       source_url.scheme_type != GLOBUS_URL_SCHEME_FTP       &&       result = globus_gass_copy_register_url_to_handle (
                                                                               &gass_copy_handle, (char*)source_URL,
       source_url.scheme_type != GLOBUS_URL_SCHEME_HTTP      &&                &source_gass_copy_attr, &dest_io_handle,
                                                                               my_callback, NULL);
       source_url.scheme_type != GLOBUS_URL_SCHEME_HTTPS     ) {      if ( result != GLOBUS_SUCCESS ) {
                                                                        printf ("error: %sn", globus_object_printable_to_string
    printf ("can not copy from %s - wrong protn", source_URL);                 (globus_error_get (result)));
                                                                        return (-1);
    return (-1);
                                                                      }
}                                                                     globus_url_destroy (&source_url);
                                                                      return (0);
globus_gass_copy_handleattr_init   (&gass_copy_handleattr);           }

globus_gass_copy_attr_init         (&source_gass_copy_attr);



globus_ftp_client_handleattr_init (&ftp_handleattr);
Example: Copy a File (SAGA)
#include <string>
#include <saga/saga.hpp>

void copy_file(std::string source_url, std::string target_url)
{
  try {
    saga::file f(source_url);
    f.copy(target_url);
  }
                                Text
  catch (saga::exception const &e) {
    std::cerr << e.what() << std::endl;
  }
}




 The interface is simple and the actual function calls remain the same
Workflow one-slide summary
•    Build distributed applications through orchestration of multiple
     services
       – Allows to compose larger applications from individual application
         components
       – The components can be independent or connected by some control
         flow/ data flow dependencies.
       – Scaled up execution over several computational resources
•    Integration of multiple teams involved (collaborative work)
•    Unit of reusage: e-science requires traceable, repetable analysis
       – Provide automation: Reproducibility of scientific analyses and processes
         is at the core of the scientific method
       – Support easy analysis modifications
       – Sharing workflows is an essential element of education, and
         acceleration of knowledge dissemination.”
       – Allows capture and generation of provenance information
•    Ease the use of grids: graphical representation
       – Capture individual data transformation and analysis steps


NSF Workshop on the Challenges of Scientific Workflows, 2006, www.isi.edu/nsf-workflows06
Y. Gil, E. Deelman et al, Examining the Challenges of Scientific Workflows. IEEE Computer, 12/2007
Workflow
•   The automation of a business process, in whole or part, during
    which documents, information or tasks are passed from one
    participant to another for action, according to a set of procedural
    rules to achieve, or contribute to, an overall business goal.
                                  Workflow Reference Model, 19/11/1998


                                                 www.wfmc.org

•   Workflow management system (WFMS) is the software that does it
What does a typical Grid WFMS provide?
•   A level of abstraction above grid processes
     – gridftp, lcg-cr, lfc-mkdir, ...
     – condor-submit, globus-job-run, glite-wms-job-submit, ...
     – lcg-infosites, ...
•   A level of abstraction above “legacy processes”
     – SQL read/write
     – HTTP file transfer, …
•   Mapping and execution of tasks grid resources
     –   Submission of jobs
     –   Invocation of (Web) services
     –   Manage data
     –   Catalog intermediate and final data products
•   Improve successful application execution
•   Improve application performance
•   Provide provenance tracking capabilities



                                                           http://www.gridworkflow.org/
What does a typical Grid WFMS provide?




                                  Abstract Workflow                       Executable Workflow

                                  Describes your workflow at a            Describes your workflow in
                                  logical level                           terms of physical files and
                                                                          paths
                                  Site Independent                        Site Specific
                                  Captures just the computation           Has additional jobs for data
                                  that the user wants to do               movement etc.




Source: Jia Yu and Rajkumar Buyya: A Taxonomy of Workflow Management Systems for Grid Computing,
Journal of Grid Computing, Volume 3, Numbers 3-4 / September, 2005
What does a typical workflow consist of?

               •   Dataflow graph
               •   Activities
                   – Definition of Jobs
                   – Specification of services
               •   Data channels
                   – Data transfer
                   – Coordination
               •   Cyclic (DAG) /acyclic
               •   Conditional statements
Workflow Lifecycle

                                         Workflow
              Reuse                                                 Creation
                                            and
                                        Component
                                         Libraries                              Data,
        Data
      Products                                                                 Metadata
                                                                               Catalogs
                                                        Populate
                         Adapt,            Workflow
                                                        with data
                         Modify            Template

                                                         Workflow
                      Data, Metadata,
                                                         Instance
                       Provenance
                       Information

                                           Executable       Map to
Scheduling/               Execute           Workflow       available              Resource,
 Execution                                                resources              Application
                                                                                 Component
            Compute,                                                             Descriptions
             Storage
Distributed    and                                                        Mapping
             Network
            Resources
Data lifecycle in workflows

                                   Metadata Catalogs
                                                                                                                 Workflow Creation
                                                        Data Discovery


Workflow Reuse
                                                                                                           Component Libraries

                                             al
                                            d




                                                                                 D
                                 anc ata an
                                        chiv




                                                                                   ata A n
                                    e Ar
                          Pro rived D




                                                        Data Lifecycle




                                                                                          alys is S
                                                  in a Workflow Environment
                             v en




    Provenance Catalogs
                            De




                                                                                                   etup
                                                                                                          Workflow Template Libraries



                                                                                                                 Workflow Mapping and
                                                      Data Processing
                                                                                                                       Execution



          Data Movement Services                                              Data Replica Catalogs

                                                    Software Catalogs
Workflow Execution
P-GRADE one-slide summary
•   P-GRADE portal desiderata
     – Hide the complexity of the underlying grid middlewares
     – Provide a high-level graphical user interface that is easy-to-use for e-
       scientists
     – Support many different grid programming approaches:
         • Simple Scripts & Control (sequential and MPI job execution)
         • Scientific Application Plug-ins (based on GEMLCA)
         • Complex Workflows
         • Parameter sweep applications: both on job and workflow level
         • Interoperability: transparent access to grids based on different
           middleware technology
     – Support three levels of parallelism
•   History
     – Started in the Hungarian SuperComputing Grid project in 2003
     – http://portal.p-grade.hu/
     – https://sourceforge.net/projects/pgportal/
Workflow sharing: MyExperiment




http://www.myexperiment.org/
Data access and integration
Researcher wants to obtain
specified data from multiple
distributed data sources and
to supply the result to a
process and then view its
output.




1 Researcher formulates query
2 Researcher submits query
3 Query system transforms and distributes query
4 Data services send back local results
5 Query system combines these to form requested data
6 Query system sends data to process
7 Process system sends derived data to researcher
OGSA-DAI one-slide summary
•   Enable the sharing of data resources to support:
     – Data access - access to structured data in distributed heterogeneous
       data resources.
     – Data transformation e.g. expose data in schema X to users as data in
       schema Y.
     – Data integration e.g. expose multiple databases to users as a single
       virtual database
     – Data delivery - delivering data to where it's needed by the most
       appropriate means e.g. web service, e-mail, HTTP, FTP, GridFTP


•   History
     – Started in February 2002 as part of the UK e-Science Grid Core
       Program
     – Part of OMII-UK, a partnership between:
        • OMII, The University of Southampton
        • myGrid, The University of Manchester
        • OGSA-DAI, The University of Edinburgh
Motivation for Streaming
•   Data movement is expensive
     – Bandwidth on and off chip may be scarcest resource
•   Streaming can avoid data movement
     – Eliminating transfers to and from temporary stores
     – Pushing selectivity and derivation towards data sources
     – Earlier computation termination decisions
•   Streaming can reduce elapsed time
     – Pipelines of transformations overlap computation time
     – When co-located can pass on data via caches
•   Streaming is scalable
     – Avoids locally assembling complete data sets
     – Sometimes this cannot be avoided
•   Some data sources and consumers inherently streamed
•   Permits light-weight composition and requires optimisation
OGSA-DAI Generic web services
                      Relational
                                   •   Manipulate data using OGSA-
                      Database         DAI’s generic web services
                                   •   Clients see the data in its ‘raw’
                                       format, e.g.
                         XML
                       Database         – Tables, columns, rows for
                                          relational data
                                        – Collections, elements etc. for
                       Indexed            XML data
                         File
                                   •   Clients can obtain the schema
                                       of the data
                                   •   Clients send queries in
                                       appropriate query language,
                     Relational
                                       e.g. SQL, XPath
                     Database


request
                     XML
          OGSA-DAI   Database
 data



                     Indexed
                     File
OGSA-DAI Workflows




•   Pipeline, Sequence, Parallel workflows
•   Composed of activities
•   Reduces data transfers and web service calls
Metadata Management: A Satellite Scenario



   Space
   Segment




                              SATELLITE FILES:
       Ground
                                  DMOP files
       Segment

                                 Product files




                                                 25
A Sample File in the Satellite Domain




          METADATA



            DATA
Metadata can be present in file names…
 Namefile (Product):
RA2_MW__1PNPDK20060201_120535_0000000
  62044_00424_20518_0349.N1"
   Corresponds to:




                                27
…and in file headers
FILE      ; DMOP (generated by FOS Mission Planning System)
   RECORD fhr             RECORD ID

FILENAME="DMOP_SOF__VFOS20060124_103709_00000000_00001215_20060131_01
4048_20060202_035846.N1"
      DESTINATION="PDCC"
      PHASE_START=2
      CYCLE_START=44
      REL_START_ORBIT=404                         RECORD parameters
      ABS_START_ORBIT=20498

   ENDRECORD fhr
................................
 RECORD dmop_er
         RECORD dmop_er_gen_part
            RECORD gen_event_params
                                                          RECORD parameters
               EVENT_TYPE=RA2_MEA
                                                     corresponding to other RECORD
               EVENT_ID="RA2_MEA_00000000002063"
                                                                structure.
               NB_EVENT_PR1=1
               NB_EVENT_PR3=0
               ORBIT_NUMBER=20521
               ELAPSED_TIME=623635
               DURATION=41627862
            ENDRECORD gen_event_params
      ENDRECORD dmop_er
ENDLIST all_dmop_er
ENDFILE
Metadata can be exposed
•    Metadata deserves a better treatment
      – In most cases it appears together with files or other resources
      – It is difficult to deal with
      – What about trying to query about all the files that deal with instrument X
        and where the information was taken from time T1 to T2?




    Our goal:
       Let’s make metadata a FIRST-CLASS CITIZEN in our systems
       And let’s make it FLEXIBLE to changes
Metadata in Workflows




            ID    MURA_BACSU        STANDARD;       PRT;   429 AA.
    DE    PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE
DE    (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE
                    DE   ENOLPYRUVYL TRANSFERASE) (EPT).
                              GN     MURA OR MURZ.
                           OS    BACILLUS SUBTILIS.
 OC     BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE;
                                 OC    BACILLUS.
          KW    PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE.
      FT    ACT_SITE     116      116        BINDS PEP (BY SIMILARITY).
          FT    CONFLICT     374      374        S -> A (IN REF. 3).
           SQ    SEQUENCE    429 AA; 46016 MW; 02018C5C CRC32;
      MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI
      GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP
      RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT
      IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI
Workflow Lifecycle

                              Workflow
   Reuse
                                 and
                             Component
                              Libraries                         Data,
  Data
Products                                                       Metadata
                                                               Catalogs
                                             Populate
               Adapt,           Workflow
                                             with data
               Modify           Template

                                              Workflow
           Data, Metadata,
                                              Instance
            Provenance
            Information

                                Executable       Map to
               Execute           Workflow       available         Resource,
                                               resources         Application
                                                                 Component
   Compute,                                                      Descriptions
    Storage
      and
    Network
   Resources
Metadata and workflows




•   Metadata for describing workflow entities
     – What is the value added of a given workflow?
     – What is the task a given service performs?
     – What are the services that can be associated with a
       processor?
•   Metadata for describing workflow provenance
     – How did the execution of a given workflow go?
     – What this the semantics of a data product?
     – How many invocations of a given service failed?
Some metadata about a workflow
                                                               Reference Ontology1
                                        Metadata content

                          RDF annotations


A scientific workflow




                                                              Reference Ontology2




                                    Social Tags annotations




                                                                   Reference
                                                              Controlled vocabulary


                        Free-text annotations
What can we do with metadata?
Metadata is everywhere
•   We can attach metadata almost to anything
    –   Events, notifications, logs
    –   Services and resources
    –   Schemas and catalogue entries
    –   People, meetings, discussions, conference talks
    –   Scientific publications, recommendations, quality comments
    –   Models, codes, builds, workflows,
    –   Data files and data streams
    –   Sensors and sensor data




•   But..., what do we mean by metadata???
What is the metadata of this HTML fragment?
Based on Dublin Core
The contributor and creator is the flight booking service “www.flightbookings.com”.
The date would be January 1st, 2003, in case that the HTML page has been generated on that
specific date.
The description would be something like “flight details for a travel between Madrid and Seattle via
Chicago on February 8th, 2004”.
The document format is “HTML”.
The document language is “en”, which stands for English




                                                                        Based on thesauri
                                                                        Madrid is a reference to the term with ID 7010413 in the
                                                                        thesaurus, which refers to the city of Madrid in Spain.
                                                                        Spain is a reference to the term with ID 1000095, which refers to
                                                                        the kingdom of Spain in Europe.
                                                                        Chicago is a reference to the term with ID 7013596, which refers
                                                                        to the city of Chicago in Illinois, US.
                                                                        United States of America is a reference to the term “United
                                                                        States” with ID 7012149, which refers to the US nation.
                                                                        Seattle is a reference to the term with ID 7014494, which refers
                                                                        to the city of Seattle in Washington, US.




Based on ontologies
Concept instances relate a part of the document to one or several concepts in an ontology. For example, “Flight details” may
represent an instance of the concept Flight, and can be named as AA7615_Feb08_2003, although concept instances do not
necessarily have a name.
Attribute values relate a concept instance with part of the document, which is the value of one of its attributes. For example,
“American Airlines” can be the value of the attribute companyName.
Relation instances that relate two concept instances by some domain-specific relation. For example, the flight
AA7615_Feb08_2003 and the location Madrid can be connected by the relation departurePlace
Need to Add “Semantics”
•   External agreement on meaning of annotations
     – E.g., Dublin Core for annotation of library/bibliographic information


•   Use Ontologies to specify meaning of annotations
     – Ontologies provide a vocabulary of terms, plus
     – a set of explicit assumptions regarding the intended meaning of the
       vocabulary.
         • Almost always including concepts and their classification
         • Almost always including properties between concepts
         • Similar to an object oriented model
     – Meaning (semantics) of terms is formally specified
     – Can also specify relationships between terms in multiple ontologies


•   Thus, an ontology describes a formal specification of a certain
    domain:
     – Shared understanding of a domain of interest
     – Formal and machine manipulable model of a domain of interest
S-OGSA Model
Summary
•   From the lower level of abstraction…
     – Difficulties to develop, program & deploy Grid applications using the
       existing Grid middleware


•   To a higher level of abstraction:
     – High-level APIs and metadata management
         • Programmatic approaches that provide common grid functionality at
           a correct level of abstraction for applications
         • Ability to hide underlying complexity of infrastructure, varying
           semantics, heterogeneity and changes from the application-
           developer
     – Improved data access and integration mechanisms
     – Workflow management
         • Traceable, repetable analyses of e-Science experiments
         • Graphical modelling languages for the ease of Grid application
           development
Introduction to Workflows, APIs and
             Semantics

            Session 37. July 13th, 2009


               Oscar Corcho
     (Universidad Politécnica de Madrid)

Based on slides from all the presenters in the following two days

                                Work distributed under the license Creative Commons
                                     Attribution-Noncommercial-Share Alike 3.0

Más contenido relacionado

Similar a Session 37 - Intro to Workflows, API's and semantics

Programming Modes and Performance of Raspberry-Pi Clusters
Programming Modes and Performance of Raspberry-Pi ClustersProgramming Modes and Performance of Raspberry-Pi Clusters
Programming Modes and Performance of Raspberry-Pi ClustersAM Publications
 
Session 40 : SAGA Overview and Introduction
Session 40 : SAGA Overview and Introduction Session 40 : SAGA Overview and Introduction
Session 40 : SAGA Overview and Introduction ISSGC Summer School
 
grid mining
grid mininggrid mining
grid miningARNOLD
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Ari Jolma
 
Questions On The Code And Core Module
Questions On The Code And Core ModuleQuestions On The Code And Core Module
Questions On The Code And Core ModuleKatie Gulley
 
Distributed Database practicals
Distributed Database practicals Distributed Database practicals
Distributed Database practicals Vrushali Lanjewar
 
NIIF Grid Development portfolio
NIIF Grid Development portfolioNIIF Grid Development portfolio
NIIF Grid Development portfolioFerenc Szalai
 
FutureGrid Computing Testbed as a Service
 FutureGrid Computing Testbed as a Service FutureGrid Computing Testbed as a Service
FutureGrid Computing Testbed as a ServiceGeoffrey Fox
 
Hungarian ClusterGrid and its applications
Hungarian ClusterGrid and its applicationsHungarian ClusterGrid and its applications
Hungarian ClusterGrid and its applicationsFerenc Szalai
 
OGCE RT Rroject Review
OGCE RT Rroject ReviewOGCE RT Rroject Review
OGCE RT Rroject Reviewmarpierc
 
OGCE Review for Indiana University Research Technologies
OGCE Review for Indiana University Research TechnologiesOGCE Review for Indiana University Research Technologies
OGCE Review for Indiana University Research Technologiesmarpierc
 
5. the grid implementing production grid
5. the grid implementing production grid5. the grid implementing production grid
5. the grid implementing production gridDr Sandeep Kumar Poonia
 
Automation in ArcGIS using Arcpy
Automation in ArcGIS using ArcpyAutomation in ArcGIS using Arcpy
Automation in ArcGIS using ArcpyGeodata AS
 
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingDDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingJaime Martin Losa
 
OSGi Cloud Ecosystems (EclipseCon 2013)
OSGi Cloud Ecosystems (EclipseCon 2013)OSGi Cloud Ecosystems (EclipseCon 2013)
OSGi Cloud Ecosystems (EclipseCon 2013)David Bosschaert
 
Distributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private CloudDistributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private CloudIJERA Editor
 

Similar a Session 37 - Intro to Workflows, API's and semantics (20)

Programming Modes and Performance of Raspberry-Pi Clusters
Programming Modes and Performance of Raspberry-Pi ClustersProgramming Modes and Performance of Raspberry-Pi Clusters
Programming Modes and Performance of Raspberry-Pi Clusters
 
Session 40 : SAGA Overview and Introduction
Session 40 : SAGA Overview and Introduction Session 40 : SAGA Overview and Introduction
Session 40 : SAGA Overview and Introduction
 
grid mining
grid mininggrid mining
grid mining
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...
 
Questions On The Code And Core Module
Questions On The Code And Core ModuleQuestions On The Code And Core Module
Questions On The Code And Core Module
 
Distributed Database practicals
Distributed Database practicals Distributed Database practicals
Distributed Database practicals
 
NIIF Grid Development portfolio
NIIF Grid Development portfolioNIIF Grid Development portfolio
NIIF Grid Development portfolio
 
FutureGrid Computing Testbed as a Service
 FutureGrid Computing Testbed as a Service FutureGrid Computing Testbed as a Service
FutureGrid Computing Testbed as a Service
 
Hungarian ClusterGrid and its applications
Hungarian ClusterGrid and its applicationsHungarian ClusterGrid and its applications
Hungarian ClusterGrid and its applications
 
OGCE RT Rroject Review
OGCE RT Rroject ReviewOGCE RT Rroject Review
OGCE RT Rroject Review
 
OGCE Review for Indiana University Research Technologies
OGCE Review for Indiana University Research TechnologiesOGCE Review for Indiana University Research Technologies
OGCE Review for Indiana University Research Technologies
 
5. the grid implementing production grid
5. the grid implementing production grid5. the grid implementing production grid
5. the grid implementing production grid
 
Cs6703 grid and cloud computing unit 4
Cs6703 grid and cloud computing unit 4Cs6703 grid and cloud computing unit 4
Cs6703 grid and cloud computing unit 4
 
Automation in ArcGIS using Arcpy
Automation in ArcGIS using ArcpyAutomation in ArcGIS using Arcpy
Automation in ArcGIS using Arcpy
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
 
Grid Presentation
Grid PresentationGrid Presentation
Grid Presentation
 
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingDDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
 
GCF
GCFGCF
GCF
 
OSGi Cloud Ecosystems (EclipseCon 2013)
OSGi Cloud Ecosystems (EclipseCon 2013)OSGi Cloud Ecosystems (EclipseCon 2013)
OSGi Cloud Ecosystems (EclipseCon 2013)
 
Distributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private CloudDistributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private Cloud
 

Más de ISSGC Summer School

Session 58 - Cloud computing, virtualisation and the future
Session 58 - Cloud computing, virtualisation and the future Session 58 - Cloud computing, virtualisation and the future
Session 58 - Cloud computing, virtualisation and the future ISSGC Summer School
 
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake EdlundSession 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake EdlundISSGC Summer School
 
Session 50 - High Performance Computing Ecosystem in Europe
Session 50 - High Performance Computing Ecosystem in EuropeSession 50 - High Performance Computing Ecosystem in Europe
Session 50 - High Performance Computing Ecosystem in EuropeISSGC Summer School
 
Session 49 Practical Semantic Sticky Note
Session 49 Practical Semantic Sticky NoteSession 49 Practical Semantic Sticky Note
Session 49 Practical Semantic Sticky NoteISSGC Summer School
 
Session 48 - Principles of Semantic metadata management
Session 48 - Principles of Semantic metadata management Session 48 - Principles of Semantic metadata management
Session 48 - Principles of Semantic metadata management ISSGC Summer School
 
Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical ISSGC Summer School
 
Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution ISSGC Summer School
 
Session 43 :: Accessing data using a common interface: OGSA-DAI as an example
Session 43 :: Accessing data using a common interface: OGSA-DAI as an exampleSession 43 :: Accessing data using a common interface: OGSA-DAI as an example
Session 43 :: Accessing data using a common interface: OGSA-DAI as an exampleISSGC Summer School
 
Session 24 - Distribute Data and Metadata Management with gLite
Session 24 - Distribute Data and Metadata Management with gLiteSession 24 - Distribute Data and Metadata Management with gLite
Session 24 - Distribute Data and Metadata Management with gLiteISSGC Summer School
 
General Introduction to technologies that will be seen in the school
General Introduction to technologies that will be seen in the school General Introduction to technologies that will be seen in the school
General Introduction to technologies that will be seen in the school ISSGC Summer School
 
Session 3-Distributed System Principals
Session 3-Distributed System PrincipalsSession 3-Distributed System Principals
Session 3-Distributed System PrincipalsISSGC Summer School
 

Más de ISSGC Summer School (20)

Session 58 - Cloud computing, virtualisation and the future
Session 58 - Cloud computing, virtualisation and the future Session 58 - Cloud computing, virtualisation and the future
Session 58 - Cloud computing, virtualisation and the future
 
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake EdlundSession 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
 
Session 50 - High Performance Computing Ecosystem in Europe
Session 50 - High Performance Computing Ecosystem in EuropeSession 50 - High Performance Computing Ecosystem in Europe
Session 50 - High Performance Computing Ecosystem in Europe
 
Integrating Practical2009
Integrating Practical2009Integrating Practical2009
Integrating Practical2009
 
Session 49 Practical Semantic Sticky Note
Session 49 Practical Semantic Sticky NoteSession 49 Practical Semantic Sticky Note
Session 49 Practical Semantic Sticky Note
 
Departure
DepartureDeparture
Departure
 
Session 48 - Principles of Semantic metadata management
Session 48 - Principles of Semantic metadata management Session 48 - Principles of Semantic metadata management
Session 48 - Principles of Semantic metadata management
 
Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical Session 49 - Semantic metadata management practical
Session 49 - Semantic metadata management practical
 
Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution
 
Session 43 :: Accessing data using a common interface: OGSA-DAI as an example
Session 43 :: Accessing data using a common interface: OGSA-DAI as an exampleSession 43 :: Accessing data using a common interface: OGSA-DAI as an example
Session 43 :: Accessing data using a common interface: OGSA-DAI as an example
 
Session 23 - Intro to EGEE-III
Session 23 - Intro to EGEE-IIISession 23 - Intro to EGEE-III
Session 23 - Intro to EGEE-III
 
Session 33 - Production Grids
Session 33 - Production GridsSession 33 - Production Grids
Session 33 - Production Grids
 
Social Program
Social ProgramSocial Program
Social Program
 
Session29 Arc
Session29 ArcSession29 Arc
Session29 Arc
 
Session 24 - Distribute Data and Metadata Management with gLite
Session 24 - Distribute Data and Metadata Management with gLiteSession 24 - Distribute Data and Metadata Management with gLite
Session 24 - Distribute Data and Metadata Management with gLite
 
Session 23 - gLite Overview
Session 23 - gLite OverviewSession 23 - gLite Overview
Session 23 - gLite Overview
 
General Introduction to technologies that will be seen in the school
General Introduction to technologies that will be seen in the school General Introduction to technologies that will be seen in the school
General Introduction to technologies that will be seen in the school
 
Session 3-Distributed System Principals
Session 3-Distributed System PrincipalsSession 3-Distributed System Principals
Session 3-Distributed System Principals
 
Session10part2 Servers Detailed
Session10part2  Servers DetailedSession10part2  Servers Detailed
Session10part2 Servers Detailed
 
Session18 Madduri
Session18  MadduriSession18  Madduri
Session18 Madduri
 

Último

URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 

Último (20)

URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 

Session 37 - Intro to Workflows, API's and semantics

  • 1. Introduction to Workflows, APIs and Semantics Session 37. July 13th, 2009 Oscar Corcho (Universidad Politécnica de Madrid) Based on slides from all the presenters in the following two days Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0
  • 2. Themes of the Second Week Date Theme Technology Mon 13 July How to solve my problem? Tue 14 July Higher level APIs: OGSA-DAI, SAGA and SAGA, metadata management OGSA-DAI, Grid SAM Wed 15 July Workflows P-GRADE, Semantic Metadata Thu 16 July Integrating Practical All Fri 17 July Cloud Computing (lecture)
  • 3. Principles of job Principles of high- Principles of Principles of submission and throughput service-oriented distributed data execution computing architecture management management Principles of using Higher level APIs: Workflows distributed and OGSA-DAI, SAGA high performance and metadata systems management
  • 4. Motivation • Grids are: – Dynamic: • Version, updates, new resources... – Heterogenous: • Operating Systems, Libraries, software stack • Middleware service versions and semantics • Administrative policies – access, usage, upgrade – Complex: • Production level service with high QoS non-trivial • Derived from above as well as inherently • As described by Steven this morning, operating Grids is still an effort-consuming task, and it is still somehow difficult to develop, program & deploy Grid applications using the existing Grid middleware • But as you have also seen during last week (and in Morris’ presentation today), there are many commonalities among heterogeneous middleware
  • 5. In general… • As described by Steven this morning, operating Grids is still an effort-consuming task, and it is still somehow difficult to develop, program & deploy Grid applications using the existing Grid middleware • But as you have also seen during last week (and in Morris’ presentation today), there are many commonalities among heterogeneous middleware • There is a need of: – Programmatic approaches that provide common grid functionality at a correct level of abstraction for applications – Ability to hide underlying complexity of infrastructure, varying semantics, heterogeneity and changes from the application-developer – Improved data access and integration mechanisms – Traceable, repeatable analyses of e-Science experiments – Graphical modelling languages for the ease of Grid application development
  • 6. e-Science Approach Interoperability • Increasing complexity of e-science applications that embrace multiple physical models (i.e. multi-physics) & larger scale – Creating a steadily growing demand of compute power – Demand for a ‘United Federation of world-wide Grids’ III. Complex IV. Interactive Workflows Access II. Scientific Application plug-ins V. Interoperability Grid Middleware I. Simple Scripts & Grid Control other Grid type [2] Morris Riedel et al., ‘Classification of Different Approaches for Balatonfüred, Hungary, 6th-18th July 2008 e-Science Applications in Next Generation Infrastructures, Int. Conference on e-Science 2008, Indianapolis, Indiana
  • 7. SAGA one-slide summary • Simple API for Grid Application – SAGA – Provide simple and usable programmatic interface that can be widely-adopted for the development of applications for the grid – Simplicity (80:20 restricted scope) • easy to use, install, administer and maintain – Uniformity • provides support for different application programming languages as well as consistent semantics and style for different Grid functionality – Scalability • Contains mechanisms for the same application (source) code to run on a variety of systems ranging from laptops to HPC resources – Genericity • adds support for different grid middleware – Modularity • provides a framework that is easily extendable • SAGA is not… – Middleware – Service management interface – Does not hide the resources - remote files, job (but the details)
  • 8. Example: SAGA Job submission Text
  • 9. Example: Copy a File (Globus) int copy_file (char const* source, char const* target) if (source_url.scheme_type == GLOBUS_URL_SCHEME_GSIFTP || { source_url.scheme_type == GLOBUS_URL_SCHEME_FTP ) { globus_url_t source_url; globus_ftp_client_operationattr_init (&source_ftp_attr); globus_io_handle_t dest_io_handle; globus_gass_copy_attr_set_ftp (&source_gass_copy_attr, globus_ftp_client_operationattr_t source_ftp_attr; &source_ftp_attr); globus_result_t result; } globus_gass_transfer_requestattr_t source_gass_attr; else { globus_gass_copy_attr_t source_gass_copy_attr; globus_gass_transfer_requestattr_init (&source_gass_attr, globus_gass_copy_handle_t gass_copy_handle; source_url.scheme); globus_gass_copy_handleattr_t gass_copy_handleattr; globus_gass_copy_attr_set_gass(&source_gass_copy_attr, &source_gass_attr); globus_ftp_client_handleattr_t ftp_handleattr; } globus_io_attr_t io_attr; output_file = globus_libc_open ((char*) target, O_WRONLY | O_TRUNC | O_CREAT, int output_file = -1; S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP); if ( output_file == -1 ) { if ( globus_url_parse (source_URL, &source_url) != GLOBUS_SUCCESS ) printf ("could not open the file "%s"n", target); { return (-1); printf ("can not parse source_URL "%s"n", source_URL); } /* convert stdout to be a globus_io_handle */ return (-1); if ( globus_io_file_posix_convert (output_file, 0, &dest_io_handle) } != GLOBUS_SUCCESS) { printf ("Error converting the file handlen"); return (-1); if ( source_url.scheme_type != GLOBUS_URL_SCHEME_GSIFTP && } source_url.scheme_type != GLOBUS_URL_SCHEME_FTP && result = globus_gass_copy_register_url_to_handle ( &gass_copy_handle, (char*)source_URL, source_url.scheme_type != GLOBUS_URL_SCHEME_HTTP && &source_gass_copy_attr, &dest_io_handle, my_callback, NULL); source_url.scheme_type != GLOBUS_URL_SCHEME_HTTPS ) { if ( result != GLOBUS_SUCCESS ) { printf ("error: %sn", globus_object_printable_to_string printf ("can not copy from %s - wrong protn", source_URL); (globus_error_get (result))); return (-1); return (-1); } } globus_url_destroy (&source_url); return (0); globus_gass_copy_handleattr_init (&gass_copy_handleattr); } globus_gass_copy_attr_init (&source_gass_copy_attr); globus_ftp_client_handleattr_init (&ftp_handleattr);
  • 10. Example: Copy a File (SAGA) #include <string> #include <saga/saga.hpp> void copy_file(std::string source_url, std::string target_url) { try { saga::file f(source_url); f.copy(target_url); } Text catch (saga::exception const &e) { std::cerr << e.what() << std::endl; } } The interface is simple and the actual function calls remain the same
  • 11. Workflow one-slide summary • Build distributed applications through orchestration of multiple services – Allows to compose larger applications from individual application components – The components can be independent or connected by some control flow/ data flow dependencies. – Scaled up execution over several computational resources • Integration of multiple teams involved (collaborative work) • Unit of reusage: e-science requires traceable, repetable analysis – Provide automation: Reproducibility of scientific analyses and processes is at the core of the scientific method – Support easy analysis modifications – Sharing workflows is an essential element of education, and acceleration of knowledge dissemination.” – Allows capture and generation of provenance information • Ease the use of grids: graphical representation – Capture individual data transformation and analysis steps NSF Workshop on the Challenges of Scientific Workflows, 2006, www.isi.edu/nsf-workflows06 Y. Gil, E. Deelman et al, Examining the Challenges of Scientific Workflows. IEEE Computer, 12/2007
  • 12. Workflow • The automation of a business process, in whole or part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules to achieve, or contribute to, an overall business goal. Workflow Reference Model, 19/11/1998 www.wfmc.org • Workflow management system (WFMS) is the software that does it
  • 13. What does a typical Grid WFMS provide? • A level of abstraction above grid processes – gridftp, lcg-cr, lfc-mkdir, ... – condor-submit, globus-job-run, glite-wms-job-submit, ... – lcg-infosites, ... • A level of abstraction above “legacy processes” – SQL read/write – HTTP file transfer, … • Mapping and execution of tasks grid resources – Submission of jobs – Invocation of (Web) services – Manage data – Catalog intermediate and final data products • Improve successful application execution • Improve application performance • Provide provenance tracking capabilities http://www.gridworkflow.org/
  • 14. What does a typical Grid WFMS provide? Abstract Workflow Executable Workflow Describes your workflow at a Describes your workflow in logical level terms of physical files and paths Site Independent Site Specific Captures just the computation Has additional jobs for data that the user wants to do movement etc. Source: Jia Yu and Rajkumar Buyya: A Taxonomy of Workflow Management Systems for Grid Computing, Journal of Grid Computing, Volume 3, Numbers 3-4 / September, 2005
  • 15. What does a typical workflow consist of? • Dataflow graph • Activities – Definition of Jobs – Specification of services • Data channels – Data transfer – Coordination • Cyclic (DAG) /acyclic • Conditional statements
  • 16. Workflow Lifecycle Workflow Reuse Creation and Component Libraries Data, Data Products Metadata Catalogs Populate Adapt, Workflow with data Modify Template Workflow Data, Metadata, Instance Provenance Information Executable Map to Scheduling/ Execute Workflow available Resource, Execution resources Application Component Compute, Descriptions Storage Distributed and Mapping Network Resources
  • 17. Data lifecycle in workflows Metadata Catalogs Workflow Creation Data Discovery Workflow Reuse Component Libraries al d D anc ata an chiv ata A n e Ar Pro rived D Data Lifecycle alys is S in a Workflow Environment v en Provenance Catalogs De etup Workflow Template Libraries Workflow Mapping and Data Processing Execution Data Movement Services Data Replica Catalogs Software Catalogs Workflow Execution
  • 18. P-GRADE one-slide summary • P-GRADE portal desiderata – Hide the complexity of the underlying grid middlewares – Provide a high-level graphical user interface that is easy-to-use for e- scientists – Support many different grid programming approaches: • Simple Scripts & Control (sequential and MPI job execution) • Scientific Application Plug-ins (based on GEMLCA) • Complex Workflows • Parameter sweep applications: both on job and workflow level • Interoperability: transparent access to grids based on different middleware technology – Support three levels of parallelism • History – Started in the Hungarian SuperComputing Grid project in 2003 – http://portal.p-grade.hu/ – https://sourceforge.net/projects/pgportal/
  • 20. Data access and integration Researcher wants to obtain specified data from multiple distributed data sources and to supply the result to a process and then view its output. 1 Researcher formulates query 2 Researcher submits query 3 Query system transforms and distributes query 4 Data services send back local results 5 Query system combines these to form requested data 6 Query system sends data to process 7 Process system sends derived data to researcher
  • 21. OGSA-DAI one-slide summary • Enable the sharing of data resources to support: – Data access - access to structured data in distributed heterogeneous data resources. – Data transformation e.g. expose data in schema X to users as data in schema Y. – Data integration e.g. expose multiple databases to users as a single virtual database – Data delivery - delivering data to where it's needed by the most appropriate means e.g. web service, e-mail, HTTP, FTP, GridFTP • History – Started in February 2002 as part of the UK e-Science Grid Core Program – Part of OMII-UK, a partnership between: • OMII, The University of Southampton • myGrid, The University of Manchester • OGSA-DAI, The University of Edinburgh
  • 22. Motivation for Streaming • Data movement is expensive – Bandwidth on and off chip may be scarcest resource • Streaming can avoid data movement – Eliminating transfers to and from temporary stores – Pushing selectivity and derivation towards data sources – Earlier computation termination decisions • Streaming can reduce elapsed time – Pipelines of transformations overlap computation time – When co-located can pass on data via caches • Streaming is scalable – Avoids locally assembling complete data sets – Sometimes this cannot be avoided • Some data sources and consumers inherently streamed • Permits light-weight composition and requires optimisation
  • 23. OGSA-DAI Generic web services Relational • Manipulate data using OGSA- Database DAI’s generic web services • Clients see the data in its ‘raw’ format, e.g. XML Database – Tables, columns, rows for relational data – Collections, elements etc. for Indexed XML data File • Clients can obtain the schema of the data • Clients send queries in appropriate query language, Relational e.g. SQL, XPath Database request XML OGSA-DAI Database data Indexed File
  • 24. OGSA-DAI Workflows • Pipeline, Sequence, Parallel workflows • Composed of activities • Reduces data transfers and web service calls
  • 25. Metadata Management: A Satellite Scenario Space Segment SATELLITE FILES: Ground DMOP files Segment Product files 25
  • 26. A Sample File in the Satellite Domain METADATA DATA
  • 27. Metadata can be present in file names…  Namefile (Product): RA2_MW__1PNPDK20060201_120535_0000000 62044_00424_20518_0349.N1" Corresponds to: 27
  • 28. …and in file headers FILE ; DMOP (generated by FOS Mission Planning System) RECORD fhr RECORD ID FILENAME="DMOP_SOF__VFOS20060124_103709_00000000_00001215_20060131_01 4048_20060202_035846.N1" DESTINATION="PDCC" PHASE_START=2 CYCLE_START=44 REL_START_ORBIT=404 RECORD parameters ABS_START_ORBIT=20498 ENDRECORD fhr ................................ RECORD dmop_er RECORD dmop_er_gen_part RECORD gen_event_params RECORD parameters EVENT_TYPE=RA2_MEA corresponding to other RECORD EVENT_ID="RA2_MEA_00000000002063" structure. NB_EVENT_PR1=1 NB_EVENT_PR3=0 ORBIT_NUMBER=20521 ELAPSED_TIME=623635 DURATION=41627862 ENDRECORD gen_event_params ENDRECORD dmop_er ENDLIST all_dmop_er ENDFILE
  • 29. Metadata can be exposed • Metadata deserves a better treatment – In most cases it appears together with files or other resources – It is difficult to deal with – What about trying to query about all the files that deal with instrument X and where the information was taken from time T1 to T2? Our goal: Let’s make metadata a FIRST-CLASS CITIZEN in our systems And let’s make it FLEXIBLE to changes
  • 30. Metadata in Workflows ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY). FT CONFLICT 374 374 S -> A (IN REF. 3). SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI
  • 31. Workflow Lifecycle Workflow Reuse and Component Libraries Data, Data Products Metadata Catalogs Populate Adapt, Workflow with data Modify Template Workflow Data, Metadata, Instance Provenance Information Executable Map to Execute Workflow available Resource, resources Application Component Compute, Descriptions Storage and Network Resources
  • 32. Metadata and workflows • Metadata for describing workflow entities – What is the value added of a given workflow? – What is the task a given service performs? – What are the services that can be associated with a processor? • Metadata for describing workflow provenance – How did the execution of a given workflow go? – What this the semantics of a data product? – How many invocations of a given service failed?
  • 33. Some metadata about a workflow Reference Ontology1 Metadata content RDF annotations A scientific workflow Reference Ontology2 Social Tags annotations Reference Controlled vocabulary Free-text annotations
  • 34. What can we do with metadata?
  • 35. Metadata is everywhere • We can attach metadata almost to anything – Events, notifications, logs – Services and resources – Schemas and catalogue entries – People, meetings, discussions, conference talks – Scientific publications, recommendations, quality comments – Models, codes, builds, workflows, – Data files and data streams – Sensors and sensor data • But..., what do we mean by metadata???
  • 36. What is the metadata of this HTML fragment? Based on Dublin Core The contributor and creator is the flight booking service “www.flightbookings.com”. The date would be January 1st, 2003, in case that the HTML page has been generated on that specific date. The description would be something like “flight details for a travel between Madrid and Seattle via Chicago on February 8th, 2004”. The document format is “HTML”. The document language is “en”, which stands for English Based on thesauri Madrid is a reference to the term with ID 7010413 in the thesaurus, which refers to the city of Madrid in Spain. Spain is a reference to the term with ID 1000095, which refers to the kingdom of Spain in Europe. Chicago is a reference to the term with ID 7013596, which refers to the city of Chicago in Illinois, US. United States of America is a reference to the term “United States” with ID 7012149, which refers to the US nation. Seattle is a reference to the term with ID 7014494, which refers to the city of Seattle in Washington, US. Based on ontologies Concept instances relate a part of the document to one or several concepts in an ontology. For example, “Flight details” may represent an instance of the concept Flight, and can be named as AA7615_Feb08_2003, although concept instances do not necessarily have a name. Attribute values relate a concept instance with part of the document, which is the value of one of its attributes. For example, “American Airlines” can be the value of the attribute companyName. Relation instances that relate two concept instances by some domain-specific relation. For example, the flight AA7615_Feb08_2003 and the location Madrid can be connected by the relation departurePlace
  • 37. Need to Add “Semantics” • External agreement on meaning of annotations – E.g., Dublin Core for annotation of library/bibliographic information • Use Ontologies to specify meaning of annotations – Ontologies provide a vocabulary of terms, plus – a set of explicit assumptions regarding the intended meaning of the vocabulary. • Almost always including concepts and their classification • Almost always including properties between concepts • Similar to an object oriented model – Meaning (semantics) of terms is formally specified – Can also specify relationships between terms in multiple ontologies • Thus, an ontology describes a formal specification of a certain domain: – Shared understanding of a domain of interest – Formal and machine manipulable model of a domain of interest
  • 39. Summary • From the lower level of abstraction… – Difficulties to develop, program & deploy Grid applications using the existing Grid middleware • To a higher level of abstraction: – High-level APIs and metadata management • Programmatic approaches that provide common grid functionality at a correct level of abstraction for applications • Ability to hide underlying complexity of infrastructure, varying semantics, heterogeneity and changes from the application- developer – Improved data access and integration mechanisms – Workflow management • Traceable, repetable analyses of e-Science experiments • Graphical modelling languages for the ease of Grid application development
  • 40. Introduction to Workflows, APIs and Semantics Session 37. July 13th, 2009 Oscar Corcho (Universidad Politécnica de Madrid) Based on slides from all the presenters in the following two days Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0