SlideShare una empresa de Scribd logo
1 de 43
DIET_BLAST: Architecture logicielle et petits problèmes de recherche Frédéric Desprez LIP ENS Lyon/INRIA Grenoble Rhône-Alpes EPI GRAAL/Avalon  14/06/11
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
One target application: BLAST over the grid ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],… A T C A A G T C … | | | | | | … A C C A - G T C …
One target application: BLAST over the grid ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
One target application: BLAST over the grid ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Parallelization and distribution of bioinformatics requests over the Grid ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Where to replicate the databases ? How to distribute the requests ?
Related work: Job scheduling & data replication ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications,  Ranganathan, K., Foster, I., HPDC '02, Washington, DC, USA, IEEE. 2002.
Related work: Job scheduling & data replication ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Related work: Integration of Scheduling & Replication ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Integration of Scheduling and Replication in Data Grids , Chakrabarti A., Dheepak, R.A., Sengupta, S., HiPC 2004, Lecture Notes in Computer Science, 2004.
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Parallelization and distribution of bioinformatics requests over the Grid
[object Object],[object Object],[object Object],[object Object],[object Object],Parallelization and distribution of bioinformatics requests over the Grid A. Vernois
Scheduling and Replication Algorithm ,[object Object],[object Object],[object Object],A. Vernois
Scheduling and Replication Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],A. Vernois
Using MCT A. Vernois
Using SRA A. Vernois
SRA with frequency variation detection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],G. Le Mahec
SRA with frequency variation detection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],G. Le Mahec
SRA with frequency variation detection ,[object Object],[object Object],[object Object],G. Le Mahec
SRA with frequency variation detection ,[object Object],[object Object],[object Object],G. Le Mahec
Implementation within a middleware - DIET ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],http://graal.ens-lyon.fr/DIET/
Data/replica management ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DAGDA ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],G. Le Mahec
DAGDA ,[object Object],[object Object],[object Object],[object Object],1: The client send a request for a service 2: DIET selects some SeDs according using a scheduling heuristic 3: The client sends its request to the SeD 4: The SeD downloads the data from the client and/or from other DIET servers 5: The SeD performs the call. 6: The persistent data are updated G. Le Mahec
DAGDA ,[object Object],[object Object],[object Object],[object Object],[object Object],G. Le Mahec
Putting everything together G. Le Mahec
Putting everything together ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],G. Le Mahec
Putting everything together ,[object Object],[object Object],[object Object],[object Object],G. Le Mahec
Putting everything together ,[object Object],[object Object],[object Object],[object Object],[object Object],G. Le Mahec
Putting everything together ,[object Object],Using Dynamic-SRA: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],G. Le Mahec
Conclusion and future work about DIET_BLAST ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Une collaboration autour de trois partenaires fondateurs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],http://www.decrypthon.fr/
Avec la participation d'autres partenaires ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Deployment example with Universities Sed = Server Daemon, installed on any server running Loadleveler. Note that we can define rescue SeD. MA = master agent, coordinates Jobs. We can define rescue or multiple Master Agent. WN = worker node http://www.decrypthon.fr/ ORSAY SeD LoadLeveler BORDEAUX Project Users SeD LoadLeveler SeD LoadLeveler SeD LoadLeveler Web Interface Orsay Decrypthon2 CRIHAN DB2 Orsay Decrypthon1 Master Agent DIET Décrypthon LILLE JUSSIEU BD AFM Cliniques Lyon IBM WII Data manager Interface
Philosophy of the Décrypthon grid ,[object Object],[object Object],[object Object],[object Object]
Data management Credits: H. N’Guyen, O. Poch, IGBMC Décrypthon Grid - Grid Resources Dedicated to Neuromuscular Disorders,  Bard, N., Bolze, R., Caron, E., Desprez, F., Heymann, M., Friedrich, A., Moulinier, L., Nguyen, N.-H., Poch, O. and Toursel, T., 8th HealthGrid conference, Paris, France, June, 2010.
Data management, cont
[object Object],[object Object],[object Object],[object Object],SM2PH a pilot project and a success story ,[object Object],[object Object],[object Object],[object Object],[object Object],Credits: H. N’Guyen, O. Poch, IGBMC
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Structural sampling Eukaryote Filter Automatic update every 2 months, current version : 9 SM2PH-db
Complex interconnected programs ,[object Object],[object Object],Credits: H. N’Guyen, O. Poch, IGBMC
What’s next ?  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SysFera ,[object Object],[object Object],http://www.sysfera.fr/ ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Des questions ?

Más contenido relacionado

La actualidad más candente

The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the ContinuumIan Foster
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?Robert Grossman
 
Dynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using ContainersDynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using ContainersIRJET Journal
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceRobert Grossman
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Rafael Ferreira da Silva
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...Rafael Ferreira da Silva
 
Scientific
Scientific Scientific
Scientific marpierc
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardNERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardPacificResearchPlatform
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream ProcessingZbigniew Jerzak
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningRafael Ferreira da Silva
 
Science DMZ
Science DMZScience DMZ
Science DMZJisc
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 
Deep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in SparkDeep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in SparkDataWorks Summit/Hadoop Summit
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Robert Grossman
 
Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Robert Grossman
 
Dynamic Data Center concept
Dynamic Data Center concept  Dynamic Data Center concept
Dynamic Data Center concept Miha Ahronovitz
 
Parsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in PythonParsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in PythonDaniel S. Katz
 

La actualidad más candente (20)

The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
Dynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using ContainersDynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using Containers
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...
 
Scientific
Scientific Scientific
Scientific
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardNERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie Bard
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream Processing
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
Science DMZ
Science DMZScience DMZ
Science DMZ
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Deep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in SparkDeep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in Spark
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)
 
Dynamic Data Center concept
Dynamic Data Center concept  Dynamic Data Center concept
Dynamic Data Center concept
 
Parsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in PythonParsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in Python
 

Similar a DIET_BLAST

Dataintensive
DataintensiveDataintensive
Dataintensivesulfath
 
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...Absi Ahmed
 
Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopbalmanme
 
Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentQo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentAlexander Decker
 
Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentQo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentAlexander Decker
 
A time efficient and accurate retrieval of range aggregate queries using fuzz...
A time efficient and accurate retrieval of range aggregate queries using fuzz...A time efficient and accurate retrieval of range aggregate queries using fuzz...
A time efficient and accurate retrieval of range aggregate queries using fuzz...IJECEIAES
 
An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...eSAT Publishing House
 
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...graphhoc
 
Fault Tollerant scheduling system for computational grid
Fault Tollerant scheduling system for computational gridFault Tollerant scheduling system for computational grid
Fault Tollerant scheduling system for computational gridGhulam Asfia
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Mumbai Academisc
 
A bi objective workflow application
A bi objective workflow applicationA bi objective workflow application
A bi objective workflow applicationIJITE
 
construction management.pptx
construction management.pptxconstruction management.pptx
construction management.pptxpraful91
 
Use of genetic algorithm for
Use of genetic algorithm forUse of genetic algorithm for
Use of genetic algorithm forijitjournal
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415SANTOSH WAYAL
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoopdbpublications
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Dataneirew J
 

Similar a DIET_BLAST (20)

[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
 
Dataintensive
DataintensiveDataintensive
Dataintensive
 
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
 
Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshop
 
Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentQo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environment
 
Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentQo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environment
 
A time efficient and accurate retrieval of range aggregate queries using fuzz...
A time efficient and accurate retrieval of range aggregate queries using fuzz...A time efficient and accurate retrieval of range aggregate queries using fuzz...
A time efficient and accurate retrieval of range aggregate queries using fuzz...
 
An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...
 
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...
The Impact of Data Replication on Job Scheduling Performance in Hierarchical ...
 
Fault Tollerant scheduling system for computational grid
Fault Tollerant scheduling system for computational gridFault Tollerant scheduling system for computational grid
Fault Tollerant scheduling system for computational grid
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
 
G216063
G216063G216063
G216063
 
A bi objective workflow application
A bi objective workflow applicationA bi objective workflow application
A bi objective workflow application
 
construction management.pptx
construction management.pptxconstruction management.pptx
construction management.pptx
 
Use of genetic algorithm for
Use of genetic algorithm forUse of genetic algorithm for
Use of genetic algorithm for
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
 
ADAPTER
ADAPTERADAPTER
ADAPTER
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Data
 

Más de Frederic Desprez

(R)evolution of the computing continuum - A few challenges
(R)evolution of the computing continuum  - A few challenges(R)evolution of the computing continuum  - A few challenges
(R)evolution of the computing continuum - A few challengesFrederic Desprez
 
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...Frederic Desprez
 
SILECS: Super Infrastructure for Large-scale Experimental Computer Science
SILECS: Super Infrastructure for Large-scale Experimental Computer ScienceSILECS: Super Infrastructure for Large-scale Experimental Computer Science
SILECS: Super Infrastructure for Large-scale Experimental Computer ScienceFrederic Desprez
 
Challenges and Issues of Next Cloud Computing Platforms
Challenges and Issues of Next Cloud Computing PlatformsChallenges and Issues of Next Cloud Computing Platforms
Challenges and Issues of Next Cloud Computing PlatformsFrederic Desprez
 
Experimental Computer Science - Approaches and Instruments
Experimental Computer Science - Approaches and InstrumentsExperimental Computer Science - Approaches and Instruments
Experimental Computer Science - Approaches and InstrumentsFrederic Desprez
 
Cloud Computing: De la recherche dans les nuages ?
Cloud Computing: De la recherche dans les nuages ?Cloud Computing: De la recherche dans les nuages ?
Cloud Computing: De la recherche dans les nuages ?Frederic Desprez
 
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to PracticeWorkflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to PracticeFrederic Desprez
 
Les clouds, du buzz à la vraie science
Les clouds, du buzz à la vraie scienceLes clouds, du buzz à la vraie science
Les clouds, du buzz à la vraie scienceFrederic Desprez
 
Multiple Services Throughput Optimization in a Hierarchical Middleware
Multiple Services Throughput Optimization in a Hierarchical MiddlewareMultiple Services Throughput Optimization in a Hierarchical Middleware
Multiple Services Throughput Optimization in a Hierarchical MiddlewareFrederic Desprez
 
Les Clouds: Buzzword ou révolution technologique
Les Clouds: Buzzword ou révolution technologiqueLes Clouds: Buzzword ou révolution technologique
Les Clouds: Buzzword ou révolution technologiqueFrederic Desprez
 
Avenir des grilles - F. Desprez
Avenir des grilles - F. DesprezAvenir des grilles - F. Desprez
Avenir des grilles - F. DesprezFrederic Desprez
 

Más de Frederic Desprez (14)

(R)evolution of the computing continuum - A few challenges
(R)evolution of the computing continuum  - A few challenges(R)evolution of the computing continuum  - A few challenges
(R)evolution of the computing continuum - A few challenges
 
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
 
SILECS/SLICES
SILECS/SLICESSILECS/SLICES
SILECS/SLICES
 
SILECS: Super Infrastructure for Large-scale Experimental Computer Science
SILECS: Super Infrastructure for Large-scale Experimental Computer ScienceSILECS: Super Infrastructure for Large-scale Experimental Computer Science
SILECS: Super Infrastructure for Large-scale Experimental Computer Science
 
From IoT Devices to Cloud
From IoT Devices to CloudFrom IoT Devices to Cloud
From IoT Devices to Cloud
 
Challenges and Issues of Next Cloud Computing Platforms
Challenges and Issues of Next Cloud Computing PlatformsChallenges and Issues of Next Cloud Computing Platforms
Challenges and Issues of Next Cloud Computing Platforms
 
Experimental Computer Science - Approaches and Instruments
Experimental Computer Science - Approaches and InstrumentsExperimental Computer Science - Approaches and Instruments
Experimental Computer Science - Approaches and Instruments
 
Cloud Computing: De la recherche dans les nuages ?
Cloud Computing: De la recherche dans les nuages ?Cloud Computing: De la recherche dans les nuages ?
Cloud Computing: De la recherche dans les nuages ?
 
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to PracticeWorkflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
 
Les clouds, du buzz à la vraie science
Les clouds, du buzz à la vraie scienceLes clouds, du buzz à la vraie science
Les clouds, du buzz à la vraie science
 
Multiple Services Throughput Optimization in a Hierarchical Middleware
Multiple Services Throughput Optimization in a Hierarchical MiddlewareMultiple Services Throughput Optimization in a Hierarchical Middleware
Multiple Services Throughput Optimization in a Hierarchical Middleware
 
Les Clouds: Buzzword ou révolution technologique
Les Clouds: Buzzword ou révolution technologiqueLes Clouds: Buzzword ou révolution technologique
Les Clouds: Buzzword ou révolution technologique
 
Avenir des grilles - F. Desprez
Avenir des grilles - F. DesprezAvenir des grilles - F. Desprez
Avenir des grilles - F. Desprez
 
Cloud introduction
Cloud introductionCloud introduction
Cloud introduction
 

DIET_BLAST

  • 1. DIET_BLAST: Architecture logicielle et petits problèmes de recherche Frédéric Desprez LIP ENS Lyon/INRIA Grenoble Rhône-Alpes EPI GRAAL/Avalon 14/06/11
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15. Using MCT A. Vernois
  • 16. Using SRA A. Vernois
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34. Deployment example with Universities Sed = Server Daemon, installed on any server running Loadleveler. Note that we can define rescue SeD. MA = master agent, coordinates Jobs. We can define rescue or multiple Master Agent. WN = worker node http://www.decrypthon.fr/ ORSAY SeD LoadLeveler BORDEAUX Project Users SeD LoadLeveler SeD LoadLeveler SeD LoadLeveler Web Interface Orsay Decrypthon2 CRIHAN DB2 Orsay Decrypthon1 Master Agent DIET Décrypthon LILLE JUSSIEU BD AFM Cliniques Lyon IBM WII Data manager Interface
  • 35.
  • 36. Data management Credits: H. N’Guyen, O. Poch, IGBMC Décrypthon Grid - Grid Resources Dedicated to Neuromuscular Disorders, Bard, N., Bolze, R., Caron, E., Desprez, F., Heymann, M., Friedrich, A., Moulinier, L., Nguyen, N.-H., Poch, O. and Toursel, T., 8th HealthGrid conference, Paris, France, June, 2010.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.

Notas del editor

  1. En comparant deux séquences dont l’une à un r ôle connu, on espère, si elles se ressemblent, trouver le rôle de la nouvelle séquence dans une protèine ou dans un gène. Les bases de données utilisées sont simplement des fichiers plats qui contiennent des séquences accompagnées de leurs descriptions.
  2. Si la base n’est pas « installée » dans DAGD, l’utilisateur la transmet en paramètre, sinon on se sert du système d’identifiants partagés de DAGDA (alias sur les données qui évite de se servir d’un identifiant uuid pas pratique à partager/échanger) Rem : La division du fichier d’entrée et la fusion des résultats ont un co ût négligeable en regard du coût d’exécution des BLAST.
  3. Lorsqu’un job est envoyé sur un nœud qui n’a pas la donnée, il la télécharge, la met dans un cache sur lequel on utilise LRU pour sélectionner la donnée à effacer quand on a besoin de place. En attendant que la donnée soit effacée, elle est disponible pour les autres nœuds (c’est donc un replicat).
  4. 3 fois le meilleur résultat pour l’ordo où la donnée est présente, peu importe la manière dont elle a été repliquée (random ou least loaded)… On peut noter que lorsqu’on ne réplique pas, le meilleur temps de réponse est obtenu quand on exécute le job là où il a été soumis…
  5. 5 bases de tailles de 150 Mo à 5 Go. 5 algos : blastn, blastp, blastx, tblastn et tblastx (adn vs adn, protein vs protein, adn vs protein etc…) Le plus long est tblastx (environ 20 fois plus long que blastn) Le pic du début dans le graphique SRA : Les ensembles de jobs soumis sont petits, les replications finissent après la soumission du dernier job => Le temps moyen des jobs est important puisqu’ils sont tous ordonnancés sur les m êmes machines. Le temps d’attente décroit au fur et à mesure que les réplications se finissent, les derniers jobs soumis profitant pleinement des réplications. MCT envoie chaque job sur le nœud qui sera le plus rapide pour l’exécuter à l’instant de sa soumission : Il va souvent copier les bases sur le site le plus rapide, même s’il faut pour ça effacer une base très grande et donc longue à retransmettre ensuite. Ce qu’il fera quand même pour les jobs les plus longs (tblastx). Pour les jobs les plus rapides (blastn), il avantage les nœuds qui ont déjà la base même s’ils sont lents et même si ils sont très nombreux…
  6. Optorsim
  7. Explicite : L’utilisateur décide explicitement de répliquer les données. Implicite : Ce sont les appels aux services qui provoquent les réplications de données. Contrairement à DTM, les données sont répliquées et pas déplacées. Accès direct aux données stockées + Ajout direct d’une donnée dans DIET. Automatic data management : Quand on souhaite installer une donnée sur un nœud qui ne dispose plus assez d’espace, on efface une donnée en utilisant un algo choisi dans la configuration du nœud. Transfer optimization : On choisit la « meilleure » source pour une donnée en fonction de stats réalisées pendant les transferts précédents. Storage usage management : On peut choisir quelle quantité de mémoire et quel espace disque sont réservés aux données gérées par DAGDA. Data backup/restoration : On peut enregistrer l’état actuel des distributions de données et rétablir la situation au redémarrage de DIET. (Par exemple, on arrive à la fin d’une réservation, et on veut continuer une expèrience plus tard. Au redémarrage, les données sont remises comme elles étaient avant la coupure.)
  8. Contrairement à DTM, c’est le SeD qui télécharge les données et pas le client qui les envoies « d’autorité ». Seules les descriptions des données (type, taille etc.) sont envoyées pour les requ êtes. Si on a configuré la taille maximum des messages envoyés par DAGDA, les données trop grandes sont envoyées en plusieurs fois. Ca permet également de limiter la quantité de mémoire nécessaire pour les transferts. DTM charge tout en mémoire avant d’envoyer les données.
  9. Le « cœur » de DAGDA gère l’identification et la recherche des données ainsi que le choix des sources/destinations pour les transferts. Les éléments étendus de DAGDA gèrent les limitations de ressources fixées par les utilisateurs et la sauvegarde/restauration des données. L’API permet d’accéder/ajouter directement des données dans la plateforme ainsi que de lancer des réplications.
  10. Une requ ête est un ensemble de séquences à « BLASTER » sur une base donnée. Une sous-requête est un sous-ensemble de ces séquences à BLASTER sur la même base.
  11. Use of plugin schedulers
  12. Division maximum : Si le fichier requ ête de départ contient n séquences, on crée n fichiers de requête chacun d’entre eux ne contenant qu’une séquence. Division en n sous-requêtes : On a n nœuds dispos, on crée n sous-requêtes de taille identique. Chaque nœud n’a à traiter qu’une seule requête. Avec Random, MCT & Round-Robin, la multiplication des requêtes provoque de l’overhead qui n’est pas compensé par l’ordonnancement. Le mieux reste de découper les requêtes en le nombre de nœuds dispos. Avec SRA, plus on a de requêtes, plus les fréquences sont fiables, et donc, l’algo est plus efficace. On compense l’overhead par l’ordonnancement. Globalement Dynamic-SRA est meilleur, m ême en découpant la requêtes en n parties si on a suffisamment de nœuds (ici 300 SeDs) : Sur 300 fichiers, on arrive à avoir des fréquences à peu près convenables. Avec moins de nœuds, donc moins de requ êtes, les fréquences sont de plus en plus approximatives et comme on optimise le débit de la plateforme, SRA-dynamique devient de moins en moins bon.
  13. Les algos ont des complexité différentes : BLASTN est le plus rapide à faire (ADN => alphabet de 4 lettres Vs ADN). BLASTP : Protéine => alphabet de 20 lettres Vs Protéines. BLASTX : ADN traduit en protéine Vs Protéines (traduction ADN + BLASTP) TBLASTX : Le plus long => ADN traduit en Protéine Vs Une base ADN traduite en Protéines. (Traduction de toutes les séquences et BLASTP) Globalement, le changement des fréquences n’a pas beaucoup d’influence sur MCT.