5 minute presentation during the SC13 Birds of a Feather Session on the relationship between the Research Data Alliance and High Performance Computing.
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
SC13 BoF: RDA and HPC
1. Research Data
Alliance (RDA) for
HPC
SC13
Birds of a Feather session
November 20, 2013
17:30-19:00 MST
Colorado Convention Center
Denver Colorado
Contribution of
John W. Cobb
Oak Ridge National Lab.
DataONE Project
2. Why Am I here? From what
perspectives do I speak?
• Discipline scientist
• HPC application evangelist
• Cyberinfrastructure leverage for experimental facilities
• Cyberinfrastructure/HPC center operations
• Cyberinfrastructure efforts for data-Intensive science efforts
Without data there is no science
2 Presentation name
3. HPC centers and archive have
different service objectives
Cycles not used are lost
Data management involves a
long-term commitment of
resources
3 Presentation name
4. Comparing HPC centers and data
archives
Simulations
Experiment/Observation
• Generate data at will
• Collect data from physical
events
• Can programmatically
control data quality
• Data quality may be limited
by collection methods
• Can be reproduced more
easily
• May be difficult, expensive,
or impossible to reproduce
• ==> Can be copious
• ==> May be more limited
• weaker tradition of
metadata and data quality
• long-term focus on metadata
and data quality
4 Presentation name
5. Consequently different challenges
• HPC centers excel at:
– Volume and velocity
– Analysis at scale
5 Presentation name
• Archives excel at:
– Variety
– Metadata capture
– Data quality
7. eBird pilot project
exploration and visualization
Diverse
bird
observa$ons
and
environmental
data
from
300,00
loca$ons
in
the
US
integrated
and
analyzed
using
High
Performance
Compu$ng
Resources
Model
results
Occurrence
of
Indigo
Bun=ng
(2008)
Land
Cover
Jan
Meteorology
MODIS
–
Remote
sensing
data
7 Presentation name
Apr
Jun
Sep
Dec
• Examine
pa;erns
of
migra$on
Spa$o-‐Temporal
Exploratory
Model
iden$fies
factors
affec$ng
pa;erns
of
migra$on
• Infer
how
climate
change
may
affect
bird
migra$on
9. Exploration, Visualization, and Analysis
Benchmark
Observa=ons
Workflows for
hypothesis
development, testing,
and exploration
Interactive maps and plots for multidimensional data exploration and analysis
Terrestrial
Biosphere
Model
Output
Model
Structure
Informa=on
Provenance Framework
9
9 Presentation name
10. DataONE experience
• CI created: interoperable data service functional interfaces
• 4 reference interface implementations completed
• 8 client-side “investigator toolkit” tools released, 4 more in
development
• 16 collaborating Member Node repositories (internationally)
• > 100,000 data objects published
• Conducted 81 workshops of data management
• Published 65 data management “best practices”
• Completed several baseline and follow-up surveys on state
of data management with scientists, libraries, librarians, …
10 Presentation name
11. DataONE experience (cont.)
About half the effort has been on
education, training and outreach about
data management practices
11 Presentation name
12. “Data = Human”
- Genevieve Bell SC13 Keynote
12 Presentation name