SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
An Integrated Framework for
Parameter-based Optimization
   of Scientific Workflows
Vijay S Kumar,      G. Mehta, K. Vahi, V. Ratnakar,
P. Sadayappan    Jihie Kim, Ewa Deelman, Yolanda Gil




   Mary Hall              Tahsin Kurc, Joel Saltz


 15 June 2009       HPDC 2009                       1
Motivations
• Performance of data analysis applications is
  influenced by parameters
  – optimization   search for optimal values in a
    multi-dimensional parameter space

• A systematic approach to:
  – enable the tuning of performance parameters
    (i.e., select optimal parameter values given an
    application execution context)
  – support optimizations arising from
    performance-quality trade-offs
   15 June 2009        HPDC 2009                 2
Contributions of this paper
• No auto-tuning yet (work in progress)

• Core framework that can
  – support workflow execution (with application-level QoS)
    in distributed heterogeneous environments
  – enable manually tuning of parameters simultaneously
  – allow application developers and users to express
    applications semantically
  – leverage semantic descriptions to achieve performance
    optimizations
      • customized data-driven scheduling within Condor

   15 June 2009               HPDC 2009                   3
Application characteristics
• Workflows: Directed Acyclic Graphs with well-
  defined data flow dependencies
  – mix of sequential, pleasingly parallelizable and complex
    parallel components
  – flexible execution in distributed environments

• Multidimensional data analysis
  – data partitioned into chunks for analysis
  – dataset elements bear spatial relationships, constraints
  – data has an inherent notion of quality    applications
    can trade accuracy of analysis output for
    performance

• End-user queries supplemented with application-
  level 2009 requirements 2009
    15 June QoS        HPDC                    4
Application scenario 1:
                  No quality trade-offs




• Minimize makespan while preserving highest output quality
• Scale execution to handle terabyte-sized image data
   15 June 2009            HPDC 2009                  5
Application scenario 2:
      Trade quality for performance




• Support queries with application-level QoS requirements
   – “Minimize time to classify image regions with 60% accuracy”
   – “Maximize classification accuracy of overall image within 30 minutes”
   15 June 2009                   HPDC 2009                             6
Performance optimization decisions


              • What algorithm to use for    • Where to map each
               this component?               workflow component?

              • What data-chunking           • Which components to
              strategy to adopt?             merge into meta-components?



              • What is the quality of
              input data to this component
                                             • Which components need to
              ?
                                             perform at lower accuracy
                                             levels?
              • What is the processing
              order of the chunks?


 15 June 2009 each
     View            decision as a parameter that can be tuned
                                   HPDC 2009                         7
Conventional Approach

workflow design                                     datasets



     Application workflow
     Application workflow




    Workflow Description                 Workflow Execution
   Semantic representation
                                   • clusters, the Grid or SOA
   • component discovery
                                   • task-based / services-based
   • workflow composition
                                   • batch mode / interactive
   • workflow validation
     15 June 2009            HPDC 2009                         8
Proposed approach: extensions
workflow design                             Analysis requests, queries with QoS
                                            Analysis requests, queries with QoS::
                                             “Maximize accuracy within tttime units”
                                              “Maximize accuracy within time units”
       Application workflow
      Application workflow

                                                       Trade-off module
                                               • map high-level queries to
                  metadata                     low-level execution strategies
                              datasets         • select appropriate values for
                                               performance parameters


       Description module                            Execution module

   Semantic representation                  Hierarchical execution:
   • search for components                  • map workflow components onto
   • workflow composition                   Grid sites
   • workflow validation                    • fine-grain dataflow execution of
   • performance parameters
      15 June 2009
                                            components on clusters
                                     HPDC 2009                           9
An instance of our
                   proposed framework
                           WINGS
      Description module
P                          (Workflow INstance Generation and Selection)
A
R
A                           Pegasus WMS
M                           DataCutter
       Execution module
E
T                           Condor, DAGMan
E
R
S      Trade-off module    Interacts with the description and execution
                           modules



    15 June 2009              HPDC 2009                           10
Description Module: WINGS
     (Workflow Instance Generation and Selection)

                                                • Layered workflow
                   (1)                            refinement
                                                • Workflow Template:
                                                  – abstract description
  Conceptual                                      – dataset-independent
workflow sketch                                   – resource-independent
                                                • Compact workflow
                  (2)
                                                  Instance:
                         Workflow template        – contains mappings to
                                                    actual datasets
                                                  – resource-independent
                  (3)                           • Expanded workflow
                          to Execution            instance
                             module

   15 June 2009                     HPDC 2009                        11
Workflow instance
Extensions to WINGS data ontology
                                 Extensions for
                                 Extensions for                 Application-
 “Core” data ontology
“Core” data ontology          multidimensional data             Application-
                              multidimensional data                specific
                                                                  specific
                                      analysis
                                     analysis
   File                        ChunkFile                        Chunk
    hasCreationMetadata         hasNXtiles, hasNYtiles,         ProjectedChunk
    hasFormatMetadata           hasChunksizeX, hasChunksizeY,
                                                                NormalizedChunk
    hasDescriptionFile          hasChunkIDX, hasChunkIDY,
                                                                StitchedChunk
    hasContentTemplate          hasChunkIDZ, hasOverlap,

   Collection                  StackFile                        Stack
    hasFileType                 hasStartZ, hasEndZ
                                                                Slice
    hasN_items                 SliceFile
                                                                ProjectedSlice
    hasFiles                    hasSliceIDZ, hasNXChunks,
                                hasNYChunks                     NormalizedSlice
   CollOfCollections

  • Relations between entities, constraints on metadata
  • Automatic description, naming of intermediate data products
    15 June 2009                   HPDC 2009                            12
Execution Module
Pegasus WMS (http://pegasus.isi.edu)

• Coarse-grain mapping of workflow tasks onto Grid sites
• Submits sub-workflows to DAG schedulers at each site
• Automatic data transfer between sites (via GridFTP)

DataCutter (http://datacutter.osu.edu)

• Fine-grain mapping of components onto clusters
•   Filter-stream model, asynchronous delivery
•   Each filter executes as a thread (could be C++/Java/Python)
•   Pipelined dataflow execution: Combined task- and data- parallelism
•   MPI-based version (http://bmi.osu.edu/~rutt/dcmpi)

Condor (www.cs.wisc.edu/condor)
can now execute DataCutter jobs within its “parallel universe”
    15 June 2009                 HPDC 2009                       13
Quality-preserving parameters
                                            Data Chunking strategy [W, H ]
  Image1                Image1

Partition              Partition

  Chunks          C1   C2   C3     …    Cn


     A            A1   A2   A3     …    An


     …                      …

• algorithmic variant of a component
• component placement
• grouping components into meta-components
• task-parallelism and data streaming within meta-component
   15 June 2009                 HPDC 2009                          14
Quality-trading Parameters
• Data approximation
  – e.g. spatial resolution of chunk
  – higher resolutions    greater execution
    times, but does not imply higher accuracy of
    output

• Processing order of chunks
  – the order in which data chunks are operated
    upon by a component collection
  – can process “favorable” chunks ahead of other
    chunks
  15 June 2009       HPDC 2009               15
Processing order
processing
  order
                       A               A1   A2     A3   …   An

  •         Tasks within a component collection treated as a
            batch
        –      Condor: executes them in FIFO order
  •         Implemented a priority-queue based heuristic for
            reordering task execution for a component collection
        –      “favorable” chunks are processed ahead of other chunks
        –      different QoS requirements     change the insertion scheme


  •         Can the execution of the bag-of-tasks be reordered
            dynamically?
        –      condor_prio alone is not suitable
      15 June 2009                   HPDC 2009                      16
Customized scheduling in Condor
processing                                PQ
  order
                  A             A1   A2   A3   …   An


• Customized job scheduling within Condor to support
  performance-quality trade-offs for application-level
  quality-of-service (QoS)
   – implements the priority queue scheme (overrides the FIFO scheme)
   – executes within Condor’s “scheduler” universe


• Associates tasks with the spatial coordinates of the
  respective chunks that are being processed
   – uses the automated naming of data products (metadata
     propagation) brought about by semantic descriptions

   15 June 2009               HPDC 2009                       17
Experimental setup: Test bed
• RII-MEMORY
   –   64 node Linux cluster
   –   Dual-processor 2.4 GHz Opteron nodes
   –   8GB RAM, 437 GB local RAID0 volume
   –   Gigabit Ethernet
• RII-COMPUTE
   –   32 node Linux cluster
   –   3.6 GHz Intel Xeon processors
   –   2GB RAM, 10 GB local disk
   –   Gigabit Ethernet and Infiniband
• Wide-area 10 Gbps connection
 15 June 2009             HPDC 2009           18
Performance Evaluation
• Focus on performance-quality trade-offs

• Neuroblastoma Classification workflow:
   – “Maximize overall confidence of classification within
     t time units”
   – “Maximize number of data chunks processed within t
     time units”


• How to tune quality-trading parameters to
  achieve high performance?
   – Data resolution
   – Processing order of chunks
 15 June 2009            HPDC 2009                   19
Parameters: resolution, processing order


                        Custom scheduling in Condor helps
                          trade quality for performance
                         better than default scheduling
                           for QoS requirement type 1




• 32 nodes, 21 GB image, confidence threshold = 0.25
• “Maximize overall classification confidence within time t units”
     15 June 2009                HPDC 2009                           20
Parameters: resolution, processing order


                        Custom scheduling in Condor helps
                        trade quality for performance better
                          than default scheduling for QoS
                                requirement type 1

                                           Custom scheduling in
                                           Condor can improve
                                           throughput for QoS
                                            requirement type 2


• 32 nodes, 21 GB image, confidence threshold = 0.25
• “Maximize data chunks processed within t time units”
     15 June 2009              HPDC 2009                       21
Conclusions
• Performance optimization for workflows: search
  for values in a multidimensional parameter space
• Instance of our proposed framework allows
  users to manually express values for many
  performance parameters (simultaneously):
  – quality-preserving & quality-trading
• Semantic representations of domain data and
  performance parameters can be leveraged
  – Data chunking strategy and data approximation can
    help restructure workflow for a given resource
    configuration
  – Customized job scheduling within Condor can scalably
    support application-level QoS
  15 June 2009           HPDC 2009                   22
Current and Future work
• Use semantic representations to map high-
  level queries onto low-level execution
  strategies
• Techniques to efficiently navigate the
  parameter space
  – Assume high data cardinality   Uniformity of
    application context over time
  – Use information from sample runs to build
    statistical models


  15 June 2009       HPDC 2009              23
An Integrated Framework for
Parameter-based Optimization of
      Scientific Workflows
                         Thanks!



 Vijay S Kumar (vijayskumar@bmi.osu.edu), P. Sadayappan
                             Tahsin Kurc, Joel Saltz
                             (www.cci.emory.edu)
    Ewa Deelman, Yolanda Gil (www.isi.edu)
15 June 2009                HPDC 2009                     24

Más contenido relacionado

Destacado

Energy efficient resource management for high-performance clusters
Energy efficient resource management for high-performance clustersEnergy efficient resource management for high-performance clusters
Energy efficient resource management for high-performance clustersXiao Qin
 
Scaling and scheduling to maximize application performance within budget cons...
Scaling and scheduling to maximize application performance within budget cons...Scaling and scheduling to maximize application performance within budget cons...
Scaling and scheduling to maximize application performance within budget cons...mingtemp
 
Ch5 - Project Management
Ch5 - Project ManagementCh5 - Project Management
Ch5 - Project ManagementJomel Penalba
 
Task Scheduling and Asynchronous Processing Evolved. Zend Server Job Queue
Task Scheduling and Asynchronous Processing Evolved. Zend Server Job QueueTask Scheduling and Asynchronous Processing Evolved. Zend Server Job Queue
Task Scheduling and Asynchronous Processing Evolved. Zend Server Job QueueSam Hennessy
 
Differentiating Algorithms of Cloud Task Scheduling Based on various Parameters
Differentiating Algorithms of Cloud Task Scheduling Based on various ParametersDifferentiating Algorithms of Cloud Task Scheduling Based on various Parameters
Differentiating Algorithms of Cloud Task Scheduling Based on various Parametersiosrjce
 
An Introduction To Applied Evolutionary Meta Heuristics
An Introduction To Applied Evolutionary Meta HeuristicsAn Introduction To Applied Evolutionary Meta Heuristics
An Introduction To Applied Evolutionary Meta Heuristicsbiofractal
 

Destacado (6)

Energy efficient resource management for high-performance clusters
Energy efficient resource management for high-performance clustersEnergy efficient resource management for high-performance clusters
Energy efficient resource management for high-performance clusters
 
Scaling and scheduling to maximize application performance within budget cons...
Scaling and scheduling to maximize application performance within budget cons...Scaling and scheduling to maximize application performance within budget cons...
Scaling and scheduling to maximize application performance within budget cons...
 
Ch5 - Project Management
Ch5 - Project ManagementCh5 - Project Management
Ch5 - Project Management
 
Task Scheduling and Asynchronous Processing Evolved. Zend Server Job Queue
Task Scheduling and Asynchronous Processing Evolved. Zend Server Job QueueTask Scheduling and Asynchronous Processing Evolved. Zend Server Job Queue
Task Scheduling and Asynchronous Processing Evolved. Zend Server Job Queue
 
Differentiating Algorithms of Cloud Task Scheduling Based on various Parameters
Differentiating Algorithms of Cloud Task Scheduling Based on various ParametersDifferentiating Algorithms of Cloud Task Scheduling Based on various Parameters
Differentiating Algorithms of Cloud Task Scheduling Based on various Parameters
 
An Introduction To Applied Evolutionary Meta Heuristics
An Introduction To Applied Evolutionary Meta HeuristicsAn Introduction To Applied Evolutionary Meta Heuristics
An Introduction To Applied Evolutionary Meta Heuristics
 

Similar a An Integrated Framework for Parameter-based Optimization of Scientific Workflows

Java Batch for Cost Optimized Efficiency
Java Batch for Cost Optimized EfficiencyJava Batch for Cost Optimized Efficiency
Java Batch for Cost Optimized EfficiencySridharSudarsan
 
Performance Metrics and Ontology for Describing Performance Data of Grid Work...
Performance Metrics and Ontology for Describing Performance Data of Grid Work...Performance Metrics and Ontology for Describing Performance Data of Grid Work...
Performance Metrics and Ontology for Describing Performance Data of Grid Work...Hong-Linh Truong
 
Performance Metrics and Ontology for Describing Performance Data of Grid Work...
Performance Metrics and Ontology for Describing Performance Data of Grid Work...Performance Metrics and Ontology for Describing Performance Data of Grid Work...
Performance Metrics and Ontology for Describing Performance Data of Grid Work...Hong-Linh Truong
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Kun Le
 
Track and Trace Solution Details
Track and Trace Solution DetailsTrack and Trace Solution Details
Track and Trace Solution DetailsPropix Technologies
 
21st Century Service Oriented Architecture
21st Century Service Oriented Architecture21st Century Service Oriented Architecture
21st Century Service Oriented ArchitectureBob Rhubart
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Michael Hiskey
 
10 - Architetture Software - More architectural styles
10 - Architetture Software - More architectural styles10 - Architetture Software - More architectural styles
10 - Architetture Software - More architectural stylesMajong DevJfu
 
Capacity Planning and Modelling
Capacity Planning and ModellingCapacity Planning and Modelling
Capacity Planning and ModellingAnthony Dehnashi
 
21st Century SOA
21st Century SOA21st Century SOA
21st Century SOABob Rhubart
 
Practical data science
Practical data sciencePractical data science
Practical data scienceDing Li
 
21st Century SOA
21st Century SOA21st Century SOA
21st Century SOABob Rhubart
 
Innovations in Grid Computing with Oracle Coherence
Innovations in Grid Computing with Oracle CoherenceInnovations in Grid Computing with Oracle Coherence
Innovations in Grid Computing with Oracle CoherenceBob Rhubart
 
Patterns for Building High Performance Applications in Cloud - CloudConnect2012
Patterns for Building High Performance Applications in Cloud - CloudConnect2012Patterns for Building High Performance Applications in Cloud - CloudConnect2012
Patterns for Building High Performance Applications in Cloud - CloudConnect2012Munish Gupta
 
Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)
Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)
Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)David Rosenblum
 
Model-Driven Cloud Data Storage
Model-Driven Cloud Data StorageModel-Driven Cloud Data Storage
Model-Driven Cloud Data Storagejccastrejon
 

Similar a An Integrated Framework for Parameter-based Optimization of Scientific Workflows (20)

Java Batch for Cost Optimized Efficiency
Java Batch for Cost Optimized EfficiencyJava Batch for Cost Optimized Efficiency
Java Batch for Cost Optimized Efficiency
 
Performance Metrics and Ontology for Describing Performance Data of Grid Work...
Performance Metrics and Ontology for Describing Performance Data of Grid Work...Performance Metrics and Ontology for Describing Performance Data of Grid Work...
Performance Metrics and Ontology for Describing Performance Data of Grid Work...
 
Performance Metrics and Ontology for Describing Performance Data of Grid Work...
Performance Metrics and Ontology for Describing Performance Data of Grid Work...Performance Metrics and Ontology for Describing Performance Data of Grid Work...
Performance Metrics and Ontology for Describing Performance Data of Grid Work...
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...
 
Track and Trace Solution Details
Track and Trace Solution DetailsTrack and Trace Solution Details
Track and Trace Solution Details
 
21st Century Service Oriented Architecture
21st Century Service Oriented Architecture21st Century Service Oriented Architecture
21st Century Service Oriented Architecture
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
10 - Architetture Software - More architectural styles
10 - Architetture Software - More architectural styles10 - Architetture Software - More architectural styles
10 - Architetture Software - More architectural styles
 
Performance on a budget
Performance on a budgetPerformance on a budget
Performance on a budget
 
Capacity Planning and Modelling
Capacity Planning and ModellingCapacity Planning and Modelling
Capacity Planning and Modelling
 
CADA english
CADA englishCADA english
CADA english
 
21st Century SOA
21st Century SOA21st Century SOA
21st Century SOA
 
Practical data science
Practical data sciencePractical data science
Practical data science
 
21st Century SOA
21st Century SOA21st Century SOA
21st Century SOA
 
Innovations in Grid Computing with Oracle Coherence
Innovations in Grid Computing with Oracle CoherenceInnovations in Grid Computing with Oracle Coherence
Innovations in Grid Computing with Oracle Coherence
 
Datastage Online Training
Datastage Online TrainingDatastage Online Training
Datastage Online Training
 
Patterns for Building High Performance Applications in Cloud - CloudConnect2012
Patterns for Building High Performance Applications in Cloud - CloudConnect2012Patterns for Building High Performance Applications in Cloud - CloudConnect2012
Patterns for Building High Performance Applications in Cloud - CloudConnect2012
 
Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)
Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)
Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)
 
Model-Driven Cloud Data Storage
Model-Driven Cloud Data StorageModel-Driven Cloud Data Storage
Model-Driven Cloud Data Storage
 

An Integrated Framework for Parameter-based Optimization of Scientific Workflows

  • 1. An Integrated Framework for Parameter-based Optimization of Scientific Workflows Vijay S Kumar, G. Mehta, K. Vahi, V. Ratnakar, P. Sadayappan Jihie Kim, Ewa Deelman, Yolanda Gil Mary Hall Tahsin Kurc, Joel Saltz 15 June 2009 HPDC 2009 1
  • 2. Motivations • Performance of data analysis applications is influenced by parameters – optimization search for optimal values in a multi-dimensional parameter space • A systematic approach to: – enable the tuning of performance parameters (i.e., select optimal parameter values given an application execution context) – support optimizations arising from performance-quality trade-offs 15 June 2009 HPDC 2009 2
  • 3. Contributions of this paper • No auto-tuning yet (work in progress) • Core framework that can – support workflow execution (with application-level QoS) in distributed heterogeneous environments – enable manually tuning of parameters simultaneously – allow application developers and users to express applications semantically – leverage semantic descriptions to achieve performance optimizations • customized data-driven scheduling within Condor 15 June 2009 HPDC 2009 3
  • 4. Application characteristics • Workflows: Directed Acyclic Graphs with well- defined data flow dependencies – mix of sequential, pleasingly parallelizable and complex parallel components – flexible execution in distributed environments • Multidimensional data analysis – data partitioned into chunks for analysis – dataset elements bear spatial relationships, constraints – data has an inherent notion of quality applications can trade accuracy of analysis output for performance • End-user queries supplemented with application- level 2009 requirements 2009 15 June QoS HPDC 4
  • 5. Application scenario 1: No quality trade-offs • Minimize makespan while preserving highest output quality • Scale execution to handle terabyte-sized image data 15 June 2009 HPDC 2009 5
  • 6. Application scenario 2: Trade quality for performance • Support queries with application-level QoS requirements – “Minimize time to classify image regions with 60% accuracy” – “Maximize classification accuracy of overall image within 30 minutes” 15 June 2009 HPDC 2009 6
  • 7. Performance optimization decisions • What algorithm to use for • Where to map each this component? workflow component? • What data-chunking • Which components to strategy to adopt? merge into meta-components? • What is the quality of input data to this component • Which components need to ? perform at lower accuracy levels? • What is the processing order of the chunks? 15 June 2009 each View decision as a parameter that can be tuned HPDC 2009 7
  • 8. Conventional Approach workflow design datasets Application workflow Application workflow Workflow Description Workflow Execution Semantic representation • clusters, the Grid or SOA • component discovery • task-based / services-based • workflow composition • batch mode / interactive • workflow validation 15 June 2009 HPDC 2009 8
  • 9. Proposed approach: extensions workflow design Analysis requests, queries with QoS Analysis requests, queries with QoS:: “Maximize accuracy within tttime units” “Maximize accuracy within time units” Application workflow Application workflow Trade-off module • map high-level queries to metadata low-level execution strategies datasets • select appropriate values for performance parameters Description module Execution module Semantic representation Hierarchical execution: • search for components • map workflow components onto • workflow composition Grid sites • workflow validation • fine-grain dataflow execution of • performance parameters 15 June 2009 components on clusters HPDC 2009 9
  • 10. An instance of our proposed framework WINGS Description module P (Workflow INstance Generation and Selection) A R A Pegasus WMS M DataCutter Execution module E T Condor, DAGMan E R S Trade-off module Interacts with the description and execution modules 15 June 2009 HPDC 2009 10
  • 11. Description Module: WINGS (Workflow Instance Generation and Selection) • Layered workflow (1) refinement • Workflow Template: – abstract description Conceptual – dataset-independent workflow sketch – resource-independent • Compact workflow (2) Instance: Workflow template – contains mappings to actual datasets – resource-independent (3) • Expanded workflow to Execution instance module 15 June 2009 HPDC 2009 11 Workflow instance
  • 12. Extensions to WINGS data ontology Extensions for Extensions for Application- “Core” data ontology “Core” data ontology multidimensional data Application- multidimensional data specific specific analysis analysis File ChunkFile Chunk hasCreationMetadata hasNXtiles, hasNYtiles, ProjectedChunk hasFormatMetadata hasChunksizeX, hasChunksizeY, NormalizedChunk hasDescriptionFile hasChunkIDX, hasChunkIDY, StitchedChunk hasContentTemplate hasChunkIDZ, hasOverlap, Collection StackFile Stack hasFileType hasStartZ, hasEndZ Slice hasN_items SliceFile ProjectedSlice hasFiles hasSliceIDZ, hasNXChunks, hasNYChunks NormalizedSlice CollOfCollections • Relations between entities, constraints on metadata • Automatic description, naming of intermediate data products 15 June 2009 HPDC 2009 12
  • 13. Execution Module Pegasus WMS (http://pegasus.isi.edu) • Coarse-grain mapping of workflow tasks onto Grid sites • Submits sub-workflows to DAG schedulers at each site • Automatic data transfer between sites (via GridFTP) DataCutter (http://datacutter.osu.edu) • Fine-grain mapping of components onto clusters • Filter-stream model, asynchronous delivery • Each filter executes as a thread (could be C++/Java/Python) • Pipelined dataflow execution: Combined task- and data- parallelism • MPI-based version (http://bmi.osu.edu/~rutt/dcmpi) Condor (www.cs.wisc.edu/condor) can now execute DataCutter jobs within its “parallel universe” 15 June 2009 HPDC 2009 13
  • 14. Quality-preserving parameters Data Chunking strategy [W, H ] Image1 Image1 Partition Partition Chunks C1 C2 C3 … Cn A A1 A2 A3 … An … … • algorithmic variant of a component • component placement • grouping components into meta-components • task-parallelism and data streaming within meta-component 15 June 2009 HPDC 2009 14
  • 15. Quality-trading Parameters • Data approximation – e.g. spatial resolution of chunk – higher resolutions greater execution times, but does not imply higher accuracy of output • Processing order of chunks – the order in which data chunks are operated upon by a component collection – can process “favorable” chunks ahead of other chunks 15 June 2009 HPDC 2009 15
  • 16. Processing order processing order A A1 A2 A3 … An • Tasks within a component collection treated as a batch – Condor: executes them in FIFO order • Implemented a priority-queue based heuristic for reordering task execution for a component collection – “favorable” chunks are processed ahead of other chunks – different QoS requirements change the insertion scheme • Can the execution of the bag-of-tasks be reordered dynamically? – condor_prio alone is not suitable 15 June 2009 HPDC 2009 16
  • 17. Customized scheduling in Condor processing PQ order A A1 A2 A3 … An • Customized job scheduling within Condor to support performance-quality trade-offs for application-level quality-of-service (QoS) – implements the priority queue scheme (overrides the FIFO scheme) – executes within Condor’s “scheduler” universe • Associates tasks with the spatial coordinates of the respective chunks that are being processed – uses the automated naming of data products (metadata propagation) brought about by semantic descriptions 15 June 2009 HPDC 2009 17
  • 18. Experimental setup: Test bed • RII-MEMORY – 64 node Linux cluster – Dual-processor 2.4 GHz Opteron nodes – 8GB RAM, 437 GB local RAID0 volume – Gigabit Ethernet • RII-COMPUTE – 32 node Linux cluster – 3.6 GHz Intel Xeon processors – 2GB RAM, 10 GB local disk – Gigabit Ethernet and Infiniband • Wide-area 10 Gbps connection 15 June 2009 HPDC 2009 18
  • 19. Performance Evaluation • Focus on performance-quality trade-offs • Neuroblastoma Classification workflow: – “Maximize overall confidence of classification within t time units” – “Maximize number of data chunks processed within t time units” • How to tune quality-trading parameters to achieve high performance? – Data resolution – Processing order of chunks 15 June 2009 HPDC 2009 19
  • 20. Parameters: resolution, processing order Custom scheduling in Condor helps trade quality for performance better than default scheduling for QoS requirement type 1 • 32 nodes, 21 GB image, confidence threshold = 0.25 • “Maximize overall classification confidence within time t units” 15 June 2009 HPDC 2009 20
  • 21. Parameters: resolution, processing order Custom scheduling in Condor helps trade quality for performance better than default scheduling for QoS requirement type 1 Custom scheduling in Condor can improve throughput for QoS requirement type 2 • 32 nodes, 21 GB image, confidence threshold = 0.25 • “Maximize data chunks processed within t time units” 15 June 2009 HPDC 2009 21
  • 22. Conclusions • Performance optimization for workflows: search for values in a multidimensional parameter space • Instance of our proposed framework allows users to manually express values for many performance parameters (simultaneously): – quality-preserving & quality-trading • Semantic representations of domain data and performance parameters can be leveraged – Data chunking strategy and data approximation can help restructure workflow for a given resource configuration – Customized job scheduling within Condor can scalably support application-level QoS 15 June 2009 HPDC 2009 22
  • 23. Current and Future work • Use semantic representations to map high- level queries onto low-level execution strategies • Techniques to efficiently navigate the parameter space – Assume high data cardinality Uniformity of application context over time – Use information from sample runs to build statistical models 15 June 2009 HPDC 2009 23
  • 24. An Integrated Framework for Parameter-based Optimization of Scientific Workflows Thanks! Vijay S Kumar (vijayskumar@bmi.osu.edu), P. Sadayappan Tahsin Kurc, Joel Saltz (www.cci.emory.edu) Ewa Deelman, Yolanda Gil (www.isi.edu) 15 June 2009 HPDC 2009 24