SlideShare una empresa de Scribd logo
1 de 1
Data Migration in Distributed Repositories for Collaborative ScienceData Migration in Distributed Repositories for Collaborative Science
MehmetMehmet BalmanBalman, Ismail , Ismail AkturkAkturk, , TevfikTevfik KosarKosar
Department of Computer Science, and Center for Computation & Technology, Louisiana State UniversityDepartment of Computer Science, and Center for Computation & Technology, Louisiana State University
Data Migration in Distributed Repositories for Collaborative ScienceData Migration in Distributed Repositories for Collaborative Science
MehmetMehmet BalmanBalman, Ismail , Ismail AkturkAkturk, , TevfikTevfik KosarKosar
Department of Computer Science, and Center for Computation & Technology, Louisiana State UniversityDepartment of Computer Science, and Center for Computation & Technology, Louisiana State University
STORK:  A Scheduler for Data Placement Activities in Distributed System for Large‐Scale Applications
Dynamic Adaptation in Data Transfers
Ex: submit file
[ dest_url = "gsiftp://eric1.loni.org/scratch/user/"; 
“ db b"
Stork: Data Placement Scheduler
PetaShare Architecture
Setting the Parallelism Level inside the Data Transfer Module
Instant Throughput
arguments = “‐p 4 –dbg ‐vb"; 
src_url = "file:///home/user/test/"; 
dap_type = "transfer"; 
verify_checksum = true; 
verify_filesize = true; 
set_permission = "755" ;
recursive_copy = true;
network check = true;
PetaShare Architecture
A very simple adaptive approach to adjust the level of
parallelism on the fly while data transfer is in progress.
•No external measurement and usage of the historical
data to come up with a good estimation for the
parallelism
• Reflects the best possible current settings due to the
dynamic characteristics of the distributed environment
network_check  true;
checkpoint_transfer = true;
output = “user.out”;
err = “user.err”;
log =  “userjob.log”;
] 
Data Migration
Aggregation of Data Placement Jobs
Data placement jobs are combined and processed as a single transfer job
(i.e. based on their source or destination addresses)
We have seen vast performance improvement, especially with small data files.
Test‐set: 1024 transfer jobs from 
Ducky to Queenbee  (rtt avg 5.129 
ms) ‐ 5MB data file per job
Experiments on LONI (Louisiana Optical Network Initiative) 
Error Detection and Recovery
dynamic characteristics of the distributed environment.
Dynamically Setting the number of Parallel Streams
0
500
1000
1500
2000
2500
3000
1 2 4 8 16 32
total time (sec)
(a)
single stream
2 streams
4 streams
8 streams
16 streams
32 streams
0
500
1000
1500
2000
2500
total time (sec)
(b)
single job at a time
2 parallel jobs
4 parallel jobs
8 parallel jobs
16 parallel jobs
32 parallel jobs 0
200
400
600
800
1000
1200
1400
0 20 40
total time (sec)
(c)
single job at a time
2 parallel jobs
4 parallel jobs
8 parallel jobs
16 parallel jobs
32 parallel jobs
number of parallel jobs 
Fig: Effects of parameters over total transfer time of the test‐set 
(a) without job aggregation – number of parallel jobs vs number of multiple streams    
(b) (b) transfer over single data stream – aggregation count vs number of parallel jobs 
(c) transfer over 32 streams – aggregation count vs number of parallel jobs
0 10 20 30 40
max aggregation count
0 20 40
max aggregation count
Stork.globus‐url‐copy features
-ckp | -checkpoint -use a rescue file for checkpointing
stork.globus‐url‐copy:
In case of a retry from a failure, scheduler informs the transfer module
to recover and restart the transfer using the information from a rescue
file created by the transfer module.
Performance measurement/ parameters:
aggregation count: 
maximum number of requests combined into a single transfer 
operation
Multiple streams:
number of parallel streams used for a single transfer operation
parallel jobs:  ckp | checkpoint use a rescue file for checkpointing
-ckpdebug | -checkpoint-debug
-ckpfile <filename> | -checkpoint-file <filename>
checkpoint filename. Default is "<pid>.rescue“
-cksm | -checksum >
checksum control after each transfer
-pchck | -port-check
check network connectivity and availability of the protocol
Protocols:
file:/  ‐>  local file 
gsiftp://  ‐>  GridFTP 
irods://  ‐> iRODS 
Petashare://‐> PetaShare
Acknowledgement:
This project is in part sponsored by National
Science Foundation, Department of Energy, and
Louisiana Board of Regents.
p j
number of simultaneous jobs running at the same time. 
www.storkproject.org

Más contenido relacionado

La actualidad más candente

2016 05-20-clariah-wp4
2016 05-20-clariah-wp42016 05-20-clariah-wp4
2016 05-20-clariah-wp4CLARIAH
 
Rethinking data intensive science using scalable analytics systems
 Rethinking data intensive science using scalable analytics systems Rethinking data intensive science using scalable analytics systems
Rethinking data intensive science using scalable analytics systemsnewmooxx
 
Reusable Software and Open Data To Optimize Agriculture
Reusable Software and Open Data To Optimize AgricultureReusable Software and Open Data To Optimize Agriculture
Reusable Software and Open Data To Optimize AgricultureDavid LeBauer
 
Data-Applied: Technology Insights
Data-Applied: Technology InsightsData-Applied: Technology Insights
Data-Applied: Technology InsightsDataminingTools Inc
 
Grid applications
Grid applicationsGrid applications
Grid applicationsPooja Dixit
 
Reproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsReproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsSimon Cockell
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials DataIan Foster
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows Carole Goble
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?Paul Groth
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Paolo Romano
 
Workshop on Real-time & Stream Analytics IEEE BigData 2016
Workshop on Real-time & Stream Analytics IEEE BigData 2016Workshop on Real-time & Stream Analytics IEEE BigData 2016
Workshop on Real-time & Stream Analytics IEEE BigData 2016Sabri Skhiri
 
C a s e - b a s e d S y s t e m f o r I n n o v a t i o n M a n a g e m e n t...
C a s e - b a s e d S y s t e m f o r I n n o v a t i o n M a n a g e m e n t...C a s e - b a s e d S y s t e m f o r I n n o v a t i o n M a n a g e m e n t...
C a s e - b a s e d S y s t e m f o r I n n o v a t i o n M a n a g e m e n t...Nit Celesc
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Robert Grossman
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Robert Grossman
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsPeter van Heusden
 
Method for conducting a combined analysis of grid environment’s fta and gwa t...
Method for conducting a combined analysis of grid environment’s fta and gwa t...Method for conducting a combined analysis of grid environment’s fta and gwa t...
Method for conducting a combined analysis of grid environment’s fta and gwa t...ijgca
 

La actualidad más candente (17)

2016 05-20-clariah-wp4
2016 05-20-clariah-wp42016 05-20-clariah-wp4
2016 05-20-clariah-wp4
 
Rethinking data intensive science using scalable analytics systems
 Rethinking data intensive science using scalable analytics systems Rethinking data intensive science using scalable analytics systems
Rethinking data intensive science using scalable analytics systems
 
Reusable Software and Open Data To Optimize Agriculture
Reusable Software and Open Data To Optimize AgricultureReusable Software and Open Data To Optimize Agriculture
Reusable Software and Open Data To Optimize Agriculture
 
Data-Applied: Technology Insights
Data-Applied: Technology InsightsData-Applied: Technology Insights
Data-Applied: Technology Insights
 
Grid applications
Grid applicationsGrid applications
Grid applications
 
Reproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsReproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformatics
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials Data
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
 
Workshop on Real-time & Stream Analytics IEEE BigData 2016
Workshop on Real-time & Stream Analytics IEEE BigData 2016Workshop on Real-time & Stream Analytics IEEE BigData 2016
Workshop on Real-time & Stream Analytics IEEE BigData 2016
 
C a s e - b a s e d S y s t e m f o r I n n o v a t i o n M a n a g e m e n t...
C a s e - b a s e d S y s t e m f o r I n n o v a t i o n M a n a g e m e n t...C a s e - b a s e d S y s t e m f o r I n n o v a t i o n M a n a g e m e n t...
C a s e - b a s e d S y s t e m f o r I n n o v a t i o n M a n a g e m e n t...
 
resume_MH
resume_MHresume_MH
resume_MH
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformatics
 
Method for conducting a combined analysis of grid environment’s fta and gwa t...
Method for conducting a combined analysis of grid environment’s fta and gwa t...Method for conducting a combined analysis of grid environment’s fta and gwa t...
Method for conducting a combined analysis of grid environment’s fta and gwa t...
 

Destacado

Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...balmanme
 
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1balmanme
 
Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...balmanme
 
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010balmanme
 
Presentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerPresentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerbalmanme
 
Balman stork cw09
Balman stork cw09Balman stork cw09
Balman stork cw09balmanme
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...balmanme
 
Dynamic adaptation balman
Dynamic adaptation balmanDynamic adaptation balman
Dynamic adaptation balmanbalmanme
 

Destacado (8)

Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...
 
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
 
Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...
 
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
 
Presentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerPresentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summer
 
Balman stork cw09
Balman stork cw09Balman stork cw09
Balman stork cw09
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
 
Dynamic adaptation balman
Dynamic adaptation balmanDynamic adaptation balman
Dynamic adaptation balman
 

Similar a Cybertools stork-2009-cybertools allhandmeeting-poster

Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015Jim Belak
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceIan Foster
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLAnubhav Jain
 
To architect or engineer? Lessons from DataPool on building RDM repositories
To architect or engineer? Lessons from DataPool on building RDM repositoriesTo architect or engineer? Lessons from DataPool on building RDM repositories
To architect or engineer? Lessons from DataPool on building RDM repositoriesjiscdatapool
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesIan Foster
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEWShiyong Lu
 
Supercharging your Apache OODT deployments with the Process Control System
Supercharging your Apache OODT deployments with the Process Control SystemSupercharging your Apache OODT deployments with the Process Control System
Supercharging your Apache OODT deployments with the Process Control SystemChris Mattmann
 
Iaetsd survey on big data analytics for sdn (software defined networks)
Iaetsd survey on big data analytics for sdn (software defined networks)Iaetsd survey on big data analytics for sdn (software defined networks)
Iaetsd survey on big data analytics for sdn (software defined networks)Iaetsd Iaetsd
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introductionmattcasters
 
GridComputing-an introduction.ppt
GridComputing-an introduction.pptGridComputing-an introduction.ppt
GridComputing-an introduction.pptNileshkuGiri
 
A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...ecwayprojects
 
A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...Ecwaytech
 
A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...Ecway2004
 
Dotnet a rough-set-based incremental approach for updating approximations un...
Dotnet  a rough-set-based incremental approach for updating approximations un...Dotnet  a rough-set-based incremental approach for updating approximations un...
Dotnet a rough-set-based incremental approach for updating approximations un...Ecwayt
 
A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...Ecwaytechnoz
 
A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...Ecway2004
 
Dotnet a rough-set-based incremental approach for updating approximations un...
Dotnet  a rough-set-based incremental approach for updating approximations un...Dotnet  a rough-set-based incremental approach for updating approximations un...
Dotnet a rough-set-based incremental approach for updating approximations un...Ecwaytech
 
A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...Ecwaytechnoz
 

Similar a Cybertools stork-2009-cybertools allhandmeeting-poster (20)

Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy Science
 
Or 2013-abrams-sharing-data-rich-research
Or 2013-abrams-sharing-data-rich-researchOr 2013-abrams-sharing-data-rich-research
Or 2013-abrams-sharing-data-rich-research
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
 
To architect or engineer? Lessons from DataPool on building RDM repositories
To architect or engineer? Lessons from DataPool on building RDM repositoriesTo architect or engineer? Lessons from DataPool on building RDM repositories
To architect or engineer? Lessons from DataPool on building RDM repositories
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Supercharging your Apache OODT deployments with the Process Control System
Supercharging your Apache OODT deployments with the Process Control SystemSupercharging your Apache OODT deployments with the Process Control System
Supercharging your Apache OODT deployments with the Process Control System
 
Iaetsd survey on big data analytics for sdn (software defined networks)
Iaetsd survey on big data analytics for sdn (software defined networks)Iaetsd survey on big data analytics for sdn (software defined networks)
Iaetsd survey on big data analytics for sdn (software defined networks)
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
 
GridComputing-an introduction.ppt
GridComputing-an introduction.pptGridComputing-an introduction.ppt
GridComputing-an introduction.ppt
 
A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...
 
A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...
 
A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...
 
Dotnet a rough-set-based incremental approach for updating approximations un...
Dotnet  a rough-set-based incremental approach for updating approximations un...Dotnet  a rough-set-based incremental approach for updating approximations un...
Dotnet a rough-set-based incremental approach for updating approximations un...
 
A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...
 
A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...
 
Dotnet a rough-set-based incremental approach for updating approximations un...
Dotnet  a rough-set-based incremental approach for updating approximations un...Dotnet  a rough-set-based incremental approach for updating approximations un...
Dotnet a rough-set-based incremental approach for updating approximations un...
 
A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...A rough set-based incremental approach for updating approximations under dyna...
A rough set-based incremental approach for updating approximations under dyna...
 

Más de balmanme

Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...balmanme
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
 
Experiences with High-bandwidth Networks
Experiences with High-bandwidth NetworksExperiences with High-bandwidth Networks
Experiences with High-bandwidth Networksbalmanme
 
Lblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarLblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarbalmanme
 
Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopbalmanme
 
Balman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet BalmanBalman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet Balmanbalmanme
 
Aug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarAug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarbalmanme
 
Pdcs2010 balman-presentation
Pdcs2010 balman-presentationPdcs2010 balman-presentation
Pdcs2010 balman-presentationbalmanme
 
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation NetworksAnalyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation Networksbalmanme
 
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...balmanme
 
Opening ndm2012 sc12
Opening ndm2012 sc12Opening ndm2012 sc12
Opening ndm2012 sc12balmanme
 
Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation balmanme
 
Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011balmanme
 
Welcome ndm11
Welcome ndm11Welcome ndm11
Welcome ndm11balmanme
 
2011 agu-town hall-100g
2011 agu-town hall-100g2011 agu-town hall-100g
2011 agu-town hall-100gbalmanme
 
Rdma presentation-kisti-v2
Rdma presentation-kisti-v2Rdma presentation-kisti-v2
Rdma presentation-kisti-v2balmanme
 
Streaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networksStreaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networksbalmanme
 
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAAPM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAbalmanme
 
HPDC 2012 presentation - June 19, 2012 - Delft, The Netherlands
HPDC 2012 presentation - June 19, 2012 -  Delft, The NetherlandsHPDC 2012 presentation - June 19, 2012 -  Delft, The Netherlands
HPDC 2012 presentation - June 19, 2012 - Delft, The Netherlandsbalmanme
 

Más de balmanme (19)

Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 
Experiences with High-bandwidth Networks
Experiences with High-bandwidth NetworksExperiences with High-bandwidth Networks
Experiences with High-bandwidth Networks
 
Lblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarLblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminar
 
Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshop
 
Balman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet BalmanBalman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet Balman
 
Aug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarAug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminar
 
Pdcs2010 balman-presentation
Pdcs2010 balman-presentationPdcs2010 balman-presentation
Pdcs2010 balman-presentation
 
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation NetworksAnalyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
 
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
 
Opening ndm2012 sc12
Opening ndm2012 sc12Opening ndm2012 sc12
Opening ndm2012 sc12
 
Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation
 
Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011
 
Welcome ndm11
Welcome ndm11Welcome ndm11
Welcome ndm11
 
2011 agu-town hall-100g
2011 agu-town hall-100g2011 agu-town hall-100g
2011 agu-town hall-100g
 
Rdma presentation-kisti-v2
Rdma presentation-kisti-v2Rdma presentation-kisti-v2
Rdma presentation-kisti-v2
 
Streaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networksStreaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networks
 
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAAPM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
 
HPDC 2012 presentation - June 19, 2012 - Delft, The Netherlands
HPDC 2012 presentation - June 19, 2012 -  Delft, The NetherlandsHPDC 2012 presentation - June 19, 2012 -  Delft, The Netherlands
HPDC 2012 presentation - June 19, 2012 - Delft, The Netherlands
 

Cybertools stork-2009-cybertools allhandmeeting-poster

  • 1. Data Migration in Distributed Repositories for Collaborative ScienceData Migration in Distributed Repositories for Collaborative Science MehmetMehmet BalmanBalman, Ismail , Ismail AkturkAkturk, , TevfikTevfik KosarKosar Department of Computer Science, and Center for Computation & Technology, Louisiana State UniversityDepartment of Computer Science, and Center for Computation & Technology, Louisiana State University Data Migration in Distributed Repositories for Collaborative ScienceData Migration in Distributed Repositories for Collaborative Science MehmetMehmet BalmanBalman, Ismail , Ismail AkturkAkturk, , TevfikTevfik KosarKosar Department of Computer Science, and Center for Computation & Technology, Louisiana State UniversityDepartment of Computer Science, and Center for Computation & Technology, Louisiana State University STORK:  A Scheduler for Data Placement Activities in Distributed System for Large‐Scale Applications Dynamic Adaptation in Data Transfers Ex: submit file [ dest_url = "gsiftp://eric1.loni.org/scratch/user/";  “ db b" Stork: Data Placement Scheduler PetaShare Architecture Setting the Parallelism Level inside the Data Transfer Module Instant Throughput arguments = “‐p 4 –dbg ‐vb";  src_url = "file:///home/user/test/";  dap_type = "transfer";  verify_checksum = true;  verify_filesize = true;  set_permission = "755" ; recursive_copy = true; network check = true; PetaShare Architecture A very simple adaptive approach to adjust the level of parallelism on the fly while data transfer is in progress. •No external measurement and usage of the historical data to come up with a good estimation for the parallelism • Reflects the best possible current settings due to the dynamic characteristics of the distributed environment network_check  true; checkpoint_transfer = true; output = “user.out”; err = “user.err”; log =  “userjob.log”; ]  Data Migration Aggregation of Data Placement Jobs Data placement jobs are combined and processed as a single transfer job (i.e. based on their source or destination addresses) We have seen vast performance improvement, especially with small data files. Test‐set: 1024 transfer jobs from  Ducky to Queenbee  (rtt avg 5.129  ms) ‐ 5MB data file per job Experiments on LONI (Louisiana Optical Network Initiative)  Error Detection and Recovery dynamic characteristics of the distributed environment. Dynamically Setting the number of Parallel Streams 0 500 1000 1500 2000 2500 3000 1 2 4 8 16 32 total time (sec) (a) single stream 2 streams 4 streams 8 streams 16 streams 32 streams 0 500 1000 1500 2000 2500 total time (sec) (b) single job at a time 2 parallel jobs 4 parallel jobs 8 parallel jobs 16 parallel jobs 32 parallel jobs 0 200 400 600 800 1000 1200 1400 0 20 40 total time (sec) (c) single job at a time 2 parallel jobs 4 parallel jobs 8 parallel jobs 16 parallel jobs 32 parallel jobs number of parallel jobs  Fig: Effects of parameters over total transfer time of the test‐set  (a) without job aggregation – number of parallel jobs vs number of multiple streams     (b) (b) transfer over single data stream – aggregation count vs number of parallel jobs  (c) transfer over 32 streams – aggregation count vs number of parallel jobs 0 10 20 30 40 max aggregation count 0 20 40 max aggregation count Stork.globus‐url‐copy features -ckp | -checkpoint -use a rescue file for checkpointing stork.globus‐url‐copy: In case of a retry from a failure, scheduler informs the transfer module to recover and restart the transfer using the information from a rescue file created by the transfer module. Performance measurement/ parameters: aggregation count:  maximum number of requests combined into a single transfer  operation Multiple streams: number of parallel streams used for a single transfer operation parallel jobs:  ckp | checkpoint use a rescue file for checkpointing -ckpdebug | -checkpoint-debug -ckpfile <filename> | -checkpoint-file <filename> checkpoint filename. Default is "<pid>.rescue“ -cksm | -checksum > checksum control after each transfer -pchck | -port-check check network connectivity and availability of the protocol Protocols: file:/  ‐>  local file  gsiftp://  ‐>  GridFTP  irods://  ‐> iRODS  Petashare://‐> PetaShare Acknowledgement: This project is in part sponsored by National Science Foundation, Department of Energy, and Louisiana Board of Regents. p j number of simultaneous jobs running at the same time.  www.storkproject.org