How to Troubleshoot Apps for the Modern Connected Worker
Rescue of Long-Tail Data from the Ocean Bottom to the Moon
1. Rescue of Long-Tail Data
from the Ocean Bottom to the Moon!
!
Leslie Hsu, Kerstin Lehnert, Suzanne Carbotte, Vicki Ferrini,!
1
2
3!
! John Delano , James B. Gill , Maurice Tivey
!
Lamont-Doherty Earth Observatory, Columbia University,!
! 1University of Albany, 2University of California, Santa Cruz, 3Woods Hole Oceanographic Institution!
!
!
IN12A. Data Curation, Credibility, Preservation Implementation, and Data Rescue to Enable Multi-source Science!
Fall AGU 2013!
IEDA
iedadata.org
2. Data at Risk!
¤ "Data at Risk" is scientific data that are !
¤ not in formats that permit full electronic access to the information they
contain. !
¤ Data at Risk may be !
¤ non-digital (e.g., handwritten or photographic), !
¤ on near-obsolete digital media (such as floppy disks), !
¤ or insufficiently described (lacking metadata). !
¤ Some born-digital data are considered "at risk" if they cannot be
ingested into managed databases because they lack adequate
formatting or metadata.!
!
Definition from the ICSU CODATA Data at Risk Task Group (DARTG)!
IEDA
iedadata.org
3. Data Rescue!
¤ A “Data Rescue Mission” is any effort to preserve data at risk. Rescue
missions can come in the form of digitization, format migration, treating
damaged materials (e.g., water or mold), adding metadata or any action
taken to make data accessible in the long term.!
M. Tivey
Definition from ICSU CODATA Data at Risk Task Group (DARTG)
IEDA
iedadata.org
4. Long Tail Data are often Data at Risk!
The Head:
Long Tail Characteristics!
Astronomy,
Climate,
High Energy
Physics,
Genomics
q
q
q
q
q
q
Long Tail:
Environmental and
Earth sciences
http://juliegood.wordpress.com/tag/long-tail/
L. Wyborn
More specialised!
Low volume!
On C drives!
Hard to find!
Heterogeneous!
Collected by many
people!
q Citizen science!
q Etc!
q Etc!
IEDA
iedadata.org
5. IEDA Data Rescue Mini-Awards!
¤ Established to preserve valuable legacy data sets that
are in danger by impending retirement or degradation!
¤ Evaluated by highest impact on future research by quality, size,
rarity, unique location or data type!
¤ Made accessible to the community for re-use by inclusion in the IEDA
data collections (EarthChem, MGDS, SESAR)!
¤ $7000 award to support proper compilation, documentation, transfer!
¤ 3 awardees chosen from 11 entries over a wide range of geochemical
and geophysical data!
!
IEDA
iedadata.org
6. 1: Geologic samples and geochemistry!
¤ WHAT: Compilation of sample
metadata and geochemical
analyses from three areas – Fiji,
Izu Arc, and Endeavour segment.
(James B. Gill)!
Maps made with GeoMapApp
¤ WHY: study of intra-ocean arcs
and spreading centers!
¤ HOW: Check and add incomplete
data, digitize data, add persistent
identifiers. Link between related
resources!
¤ Major challenge: Physical sample
management!
IEDA
iedadata.org
7. The importance of Sample identification!
¤ Individual samples can play a large role in scientific conclusions, so
accurate documentation of sample metadata is critical.!
¤ The key measurement was the one backarc basalt called "PPTUW”...
Subsequent efforts to confirm the observation ran into problems. The
apparently-same sample was variously called PPTU, PPTUW/5,
PPTUW-1, and TVZ19 in four other papers. None of those papers gave
its latitude and longitude… (J. Gill and E. Todd)!
IEDA
iedadata.org
8. 2: Near-bottom magnetics!
¤ WHAT: Compilation of near-bottom
magnetometer data, including raw,
merged, processed, and navigation
metadata (Maurice Tivey)!
¤ WHY: study of magnetic reversals,
effect of tectonics on magnetic field!
¤ HOW: gather data from different
formats, add complete metadata
and workflow!
¤ Challenge: over three decades of
technology and file formats!
IEDA
iedadata.org
16. Lessons learned: investigator!
¤ Take ownership of your own legacy!
¤ Data curation by others may not be complete or correct!
¤ Data rescue of an entire career does not need to be
overwhelming !
¤ Start with small steps!
¤ Disciplinary repositories will help and guide you to what is needed!
¤ Despite the time investment, data rescue is worth it!
¤ Others will now be able to re-use the data!
¤ Notes taken years ago actually explain anomalies!
!
IEDA
iedadata.org
17. Lessons learned: repository!
¤ For Long Tail Data, every project is different !
¤ There is not an established workflow – just past experience!
¤ Time commitment from staff is nontrivial!
¤ Disciplinary training helps a great deal!
¤ Investigators need help determining the best products!
¤ A small incentive will motivate investigators!
¤ Data Rescue missions help the repository determine
next steps for development of tools and services!
IEDA
iedadata.org
18. Summary of Long-tail Data Rescue!
¤ Three Data Rescue efforts this past year by IEDA have
made data that were at risk!
¤ digitized from analog data and near-obsolete media!
¤ sufficiently described for reuse!
¤ in formats that permit full electronic access!
¤ Citable, with persistent identifiers, and ready for reuse!
¤ The projects also helped IEDA identify improvements in
data rescue workflow, and future tools and services!
IEDA
iedadata.org
19. More Data Rescue Activities!
¤ Elsevier-IEDA Data Rescue Process Study!
¤ A data entry tool for lunar geochemistry: MoonDB!
¤ Elsevier-IEDA International Data Rescue Award!
¤ Winner announced at reception tonight, Monday Dec 9th, 2013!
¤ Intercontinental Hotel, Twin Peaks Room, 7:00-8:30pm!
IEDA
iedadata.org