The DFC project aims to federate data grids to enable collaboration. It uses iRODS to build a federated data grid that supports reproducible science with workflows as first class objects and provenance. The project focuses on interoperability by allowing iRODS grids to interface with other systems like DataONE. It also develops tools for data discovery, access, manipulation, transformation, subsetting, and visualization from workflows. Current work involves client side tools for ingestion, access control, and integrated analysis. The project also works on standards, policies, and repository management tools to support trustworthy and sustainable data curation practices.
3. What does DFC do?
• Federate to enable collaboration
– Federation of iRODS-based data grids
– Interoperability for federation with other systems
• Enable reproducible science
– Workflows as first class data objects; provenance
• Build on policy-based data system (iRODS)
– Best practices in curation, archiving
– Automated data grid administration functions
7. SW Quality & User Community
Version 4.0 release March 31
Sustainability
8. SW Quality & User Community
Sustainability
iRODS User Meeting
June 18-19, 2014 Boston
9. Federating across systems
Interoperability
DataONE member node looks like a
another iRODS grid to iRODS user
Capability has been demonstrated
DataONE
Member
Node
iRODS
Federated
Grid
iRODS Data Grids
Interface
(via APIs)
to DataONE
Cloud
Storage
12. What our users want
• Data discovery
• Data access from a workflow
• Data manipulation (parsing of a data format)
• Data transformation
– converting to a new coordinate system)
– creating new physical variables by combining other variables
– converting to new physical units
• Data subsetting (extracting a sub-region)
• Data registration (GIS co-registration)
• Data visualization
• Creation of derived data products
Data
Users
13. Current work: client side tools
• Ingest-MediaWiki, iDropWeb
– Metadata templating, bulk uploads
– Database and indexing: plug-in in V. 4.0
• Access control
– Access for user defined “group” (my team)
• Integrated access to analysis tools
• Interfaces: Jargon, message-passing IF
framework
Data
Producers
& Users
14. Standards and Policies
Curators,
Archivists
• Community practices
and policies
– Unwritten, non-existent
• Developing
international standards
• Implementation in iRODS server
• Future: tools to make writing
rules easier
15. Repository management tools
• Best practices embodied
in iRODS rules and
policies
– Trustworthy repository
• Automatic execution
– Copy, backup,
checksum
– Triggers: time, event
Data
Center
Managers
• Tools for grid administrators
Two underlying capabilities…If DFC is about federation, then must have interoperability between iRODS grids and with othersIf DFC is to be a part of the long term national cyberinfrastructure for data management, must be sustainableLet me update you on that last one first.
Version 4.0 release March 31Merge of iRODS 3.3.1 (DICE/DFC developed) with “enterprise” irods RENCI developedNewer SW engineering best practicesOne click installRigorous testingServer + plug-ins modelNew features added as plug-ins
Version 4.0 release March 31One click installRigorous testingServer + plug-ins modelNew features added as plug-ins
Data discoveryData access from a workflowData manipulation (parsing of a data format)Data transformation (converting to a new coordinate system)Data transformation (creating new physical variables by combining other variables)Data transformation (converting to new physical units)Data subsetting (extracting a sub-region)Data registration (GIS co-registration)Data visualizationCreation of derived data productsRodsWiki is a MediaWiki extension that enables MediaWiki file uploads to be stored in iRODS and to allow wiki users to download those files as well as to view and manipulate their metadata. This enables storage for large scientific datasets to leverage the benefits of being stored in iRODS while still seamlessly interacting with standard MediaWiki interfaces.
Puzzled when you ask about this….
Can iRODSAutomate distributed data sharing across different data management systems? Maintain control of different data sets across different storage systems? Allow and automate a range of data services without involving system administrators? YES.