Next-generation sequencing is producing vast amounts of data. Providing storage and compute is only half the battle. Researchers and IT staff need to be able to "manage" data, in order to stay productive.
Talk given at BIO-IT World, Europe 2010.
20. Currently using LSF to manage workflow. LSF Fast scratch disk Archival / Warehouse disk Network
21.
22.
23.
24.
25. Sequencing data flow. Automated processing and data management Sequencer Analysis/ alignment Internal repository EGA / SRA (EBI) compute-farm High-performance storage Manual data movement
35. Sequencing data flow. Automated processing and data management Manual Sequencer Analysis/ alignment Internal repository EGA / SRA (EBI) compute-farm High-performance storage Managed data movement
36.
37. iRODS ICAT Catalogue database Rule Engine Implements policies Irods Server Data on disk User interface WebDAV, icommands,fuse Irods Server Data in database Irods Server Data in S3
50. Wishlist: HPC Integration Data is staged in/out to filesystem Archive / Metadata system Fast Storage / POSIX filesystem Compute farm Fast Storage / POSIX filesystem + Metadata sytem Compute farm System can do rule/metadata based ops and standard POSIX ops too.