SlideShare una empresa de Scribd logo
1 de 39
Descargar para leer sin conexión
The HDF Group

Improving long-term
preservation of EOS data by
independently mapping HDF4
data objects
Mike Folk, Ruth Aydt, Joe Lee, Binh-Minh Ribler, Kent Yang
Ruth Duerr, Christopher Lynnes
The 14th HDF and HDF-EOS Workshop
September 28-30, 2010
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

1

www.hdfgroup.org
Mapping project team members

The HDF Group
•
•
•
•
•
•
•
•
•
•

Ruth Aydt
Peter Cao
Mike Folk
Joe Lee
Elena Pourmal
Tong Qi
Binh-Minh Ribler
Eunsoo Seo
Veer Singh
Muqun {Kent} Yang

September 28-30, 2010

NASA
• Ruth Duerr (NSIDC)
• Chris Lynnes (GESDISC)

HDF/HDF-EOS Workshop XIV

2

www.hdfgroup.org
HDF4 files are complex

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

3

www.hdfgroup.org
How do HDF users avoid
having to deal with all of that
complexity?

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

4

www.hdfgroup.org
Through the HDF software libraries,
either by using HDF APIs directly,

or by using HDF tools that depend
on the HDF libraries.
But what about the future…
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

5

www.hdfgroup.org
Over the long term, there is a
risk in depending solely on HDF
software to access HDFformatted data.
It is possible
in the distant future, that the
software may not be available.
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

6

www.hdfgroup.org
“If only we could read HDF data with an
independent program that does not rely on
the HDF API…
A possible approach [would be to create] a
map of a data file, [and] utilities to
find, assemble and write out SDSes and
vdatas.”
“Leveraging HDF Utilities”
Christopher Lynnes
HDF Workshop X.
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

7

www.hdfgroup.org
User’s view of the HDF4 SD model

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

8

www.hdfgroup.org
Mapping SDS to file offset/length

HDF4 file
layout

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

9

www.hdfgroup.org
Mapping with compressed chunks

HDF4 file
layout

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

10

www.hdfgroup.org
Recap
• Problem
• The complex byte layout of HDF files makes
long-term readability of HDF data dependent
on long-term availability of HDF software.

• Solution
• Create a map of the layout of data objects in
an HDF file, allowing a simple reader to be
written to access the data.

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

11

www.hdfgroup.org
HDF4 mapping workflow

HDF4 File

hmap
linked with
HDF4 library

HDF4 Mapping File
(XML document)

Groups, Data Objects,
Structural and Application
Metadata;
Locations of Object Data

Object Data

Reader
program

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

12

www.hdfgroup.org
Target User
•
•
•
•

Person 20+ years in the future
Interested in data stored in HDF4 file
Has HDF4 file and companion map file
Can “write a program”

• May not have:
• HDF4 data model, format, documentation, or software
• Mapping schema, documentation, or software

• Will have knowledge of:
• Basic XML
• Data representations used today
• Compression used by HDF4 (JPEG, Szip, etc.)

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

13

www.hdfgroup.org
Project Phases
• Phase 1
• Categorize HDF4 data held by NASA.
• Build a prototype
• XML layout representation
• Tool to create XML map file for given HDF4 file
• Tools to read HDF4 data based solely on map
files

• Phase 2
• Build a robust version
• Deploy
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

14

www.hdfgroup.org
How many HDF4 products?
Data Center

HDF4 Products

ASF

0

GES-DISC
GHRC

54

ASDC

63

LP-DAAC

67

NSIDC

47

ORNL-DAAC

2

PO.DAAC

22

SDAC

0

MrDC

95

Total

September 28-30, 2010

236

586

HDF/HDF-EOS Workshop XIV

15

www.hdfgroup.org
Data characteristics
Product Characteristics Examined
• For SDS data
• Product Identification
• Number of SDSs
• Product Name
• Max number of dimensions
• Data Level
• Did any SDS have attributes
• Archive Location
• Was any SDS annotated

• For HDF-EOS
products

• HDF-EOS version
• For swath data
• Number of swaths
• Maximum number of
dimensions
• Organized by
time, space, both, or
other

• Etc.
September 28-30, 2010

• Were dimension scales
used
• Was compression used and
if so what kind
• Was chunking used

• For Vdata
• Number of Vdata structures
• Did any have attributes
• Did any fields have
attributes

• Etc.

HDF/HDF-EOS Workshop XIV

16

www.hdfgroup.org
Phase 2 tasks
A. Investigate integration of mapping schema
with existing standards
B. Determine HDF-EOS 2 requirements
C. Redesign and expand the XML schema
D. Implement production quality map writer
E. Develop demo map reader
F. Deploy tools at select NASA data centers

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

17

www.hdfgroup.org
The HDF Group

Task A
Investigate integration of
mapping schema with existing
standards

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

18

www.hdfgroup.org
Investigate existing standards
• Investigated:
• METS, PREMIS, ESML, NcML, and CSML

• Concluded:
• Existing standards have different purposes than
mapping schema
• None meet all needs of mapping project

• Develop new schema tailored to project goals
• Harmonize with PREMIS
• Leverage terminology and approaches from all
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

19

www.hdfgroup.org
The HDF Group

Task B
Determine HDF-EOS2
requirements

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

20

www.hdfgroup.org
Categorize HDF-EOS2 data products
• Created a data pool from NASA data centers
• GES DISC, NSIDC, LAADS, LP DAAC
• LaRC, PO.DAAC, GHRC, OBPG, LAADS

• Detailed description of sample data
• Reported options for adding HDF-EOS2
contents to the mapping file
• Documents and reports at wiki:
http://wiki.hdfgroup.org/MappingPhase2_TaskB

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

21

www.hdfgroup.org
The HDF Group

Task C
Redesign Schema

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

22

www.hdfgroup.org
Design priorities
• Mapping files
• Provide complete access to user-supplied
content in NASA’s EOS binary HDF4 files
• Have enough information to stand on their own
• Be as simple as possible

• Mapping schema
• Describe the Mapping files
• Used for validation and documentation
• May not be available to target user
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

23

www.hdfgroup.org
Representation of HDF4 Objects
HDF4 User-Level Object

Mapping File XML Element

Attribute, Annotation

Attribute

Vgroup

Group

Vdata

Table

SDS

Array

Dimension

Dimension

Raster Image

Not yet done

Palette

Not yet done

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

24

www.hdfgroup.org
Mapping File – Group & Table (fragment)

Select raw data
Information needed
Represents HDF4
values included to
to access and
Objects and
help user verify in
interpret raw data
Relationships
binary data handled
HDF4 file
properly

AMSR_E_L2_Land_V09_200501180027_D
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

25

www.hdfgroup.org
Status and Plans
• Status
• Map file design stabilizing for most HDF4
objects

• Plans
• Complete design for Raster Images and
Palettes
• Continue to refine instructions and contents
• Finalize schema

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

26

www.hdfgroup.org
The HDF Group

Task D
Implement Writer

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

27

www.hdfgroup.org
Map Writer Requirements
• Retrieve information needed from HDF4 file
• Write out corresponding XML file
• Quality requirements
• Completeness – don’t miss any objects in file.
• Accuracy – don’t give wrong information.

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

28

www.hdfgroup.org
Writer Status and Plan
• Status
• Covers most Vgroup/Vdata/SDS objects.
• Covers some GR/Annotation objects.
• Being tested with NASA data.

• Plans:
• Increase coverage / accuracy / reliability.

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

29

www.hdfgroup.org
The HDF Group

Task E
Implement demo reader

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

30

www.hdfgroup.org
Demo Reader Requirements
• Multiplatform command line tool
• Easy to use clear arguments and output
• Must validate that objects in the mapping file
are actually in the HDF4 file
• Developed in a well-supported high level
language (python)
• Well documented
• Available as open source

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

31

www.hdfgroup.org
Demo Reader Status
• Status
• Only Vdata support provided so far
• Current source code available at
https://sourceforge.net/projects/pyhdf
• Documentation at http://pyhdf.sourceforge.net/

• Plans
• SDS and RIS support

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

32

www.hdfgroup.org
The HDF Group

Task G
Deploy

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

33

www.hdfgroup.org
Deploy
• Begin in Jan 2011, complete in April
• Activities:
• GES DISC
• Incorporate into the existing archive ingest
system
• Manage the retrofit into existing metadata files

• NSIDC
• Support implementation in NSIDC’s ECS system

• Other ESDCs
• Encouraged to join in
• But deployment to other centers expected
subsequent to the project.
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

34

www.hdfgroup.org
The HDF Group

Thank You!

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

35

www.hdfgroup.org
Acknowledgements
This work was supported by cooperative agreement
number NNX08AO77A from the National
Aeronautics and Space Administration (NASA).
Any opinions, findings, conclusions, or
recommendations expressed in this material are
those of the author[s] and do not necessarily reflect
the views of the National Aeronautics and Space
Administration.

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

36

www.hdfgroup.org
The HDF Group

Questions/comments?

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

37

www.hdfgroup.org
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

38

www.hdfgroup.org
Extra slides

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

39

www.hdfgroup.org

Más contenido relacionado

La actualidad más candente

Earth Science Platform
Earth Science PlatformEarth Science Platform
Earth Science PlatformTed Habermann
 
Visualising Research Graph using Neo4j and Gephi
Visualising Research Graph using Neo4j and GephiVisualising Research Graph using Neo4j and Gephi
Visualising Research Graph using Neo4j and Gephiamiraryani
 

La actualidad más candente (20)

HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFViewHDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
 
HDF Group Support for NPP/NPOESS/JPSS
HDF Group Support for NPP/NPOESS/JPSSHDF Group Support for NPP/NPOESS/JPSS
HDF Group Support for NPP/NPOESS/JPSS
 
Bridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data ProductsBridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data Products
 
HDF Town Hall
HDF Town HallHDF Town Hall
HDF Town Hall
 
Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4
 
Geoscience Data Analysis and Visualization Tools from NCAR
Geoscience Data Analysis and Visualization Tools from NCARGeoscience Data Analysis and Visualization Tools from NCAR
Geoscience Data Analysis and Visualization Tools from NCAR
 
Data Are from Mars, Tools Are from Venus
Data Are from Mars, Tools Are from VenusData Are from Mars, Tools Are from Venus
Data Are from Mars, Tools Are from Venus
 
GES DISC Eexperiences with HDF Formats for MEaSUREs Projects
GES DISC Eexperiences with HDF Formats for MEaSUREs ProjectsGES DISC Eexperiences with HDF Formats for MEaSUREs Projects
GES DISC Eexperiences with HDF Formats for MEaSUREs Projects
 
Efficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAPEfficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAP
 
Using IDL with Suomi NPP VIIRS Data
Using IDL with Suomi NPP VIIRS DataUsing IDL with Suomi NPP VIIRS Data
Using IDL with Suomi NPP VIIRS Data
 
Advancing Scientific Data Support in ArcGIS
Advancing Scientific Data Support in ArcGISAdvancing Scientific Data Support in ArcGIS
Advancing Scientific Data Support in ArcGIS
 
The New HDF-EOS WebSite - How it can help you
The New HDF-EOS WebSite - How it can help youThe New HDF-EOS WebSite - How it can help you
The New HDF-EOS WebSite - How it can help you
 
Images of HDF5
Images of HDF5Images of HDF5
Images of HDF5
 
Earth Science Platform
Earth Science PlatformEarth Science Platform
Earth Science Platform
 
Hdf5 intro
Hdf5 introHdf5 intro
Hdf5 intro
 
ENVI/IDL Tools for HDF
ENVI/IDL Tools for HDFENVI/IDL Tools for HDF
ENVI/IDL Tools for HDF
 
Survey of Data Format Tools
Survey of Data Format ToolsSurvey of Data Format Tools
Survey of Data Format Tools
 
Visualising Research Graph using Neo4j and Gephi
Visualising Research Graph using Neo4j and GephiVisualising Research Graph using Neo4j and Gephi
Visualising Research Graph using Neo4j and Gephi
 
Guided Tour of Pythonian Museum
Guided Tour of Pythonian MuseumGuided Tour of Pythonian Museum
Guided Tour of Pythonian Museum
 
VRA 2014 VRA Core Unbound, Arnold
VRA 2014 VRA Core Unbound, ArnoldVRA 2014 VRA Core Unbound, Arnold
VRA 2014 VRA Core Unbound, Arnold
 

Destacado

Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...
Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...
Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...The HDF-EOS Tools and Information Center
 

Destacado (20)

Earth Science Data and Information System (ESDIS) Project Update
Earth Science Data and Information System (ESDIS) Project UpdateEarth Science Data and Information System (ESDIS) Project Update
Earth Science Data and Information System (ESDIS) Project Update
 
Data Interoperability
Data InteroperabilityData Interoperability
Data Interoperability
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
Advanced HDF5 Features
 
2011 ACSI Survey Summary
2011 ACSI Survey Summary2011 ACSI Survey Summary
2011 ACSI Survey Summary
 
HDF4 Mapping Project Update
HDF4 Mapping Project UpdateHDF4 Mapping Project Update
HDF4 Mapping Project Update
 
Web-based On-demand Global NDVI Data Services
Web-based On-demand Global NDVI Data ServicesWeb-based On-demand Global NDVI Data Services
Web-based On-demand Global NDVI Data Services
 
EOSDIS Survey Overview
EOSDIS Survey OverviewEOSDIS Survey Overview
EOSDIS Survey Overview
 
Easy Access of NASA HDF data via OPeNDAP
Easy Access of NASA HDF data via OPeNDAPEasy Access of NASA HDF data via OPeNDAP
Easy Access of NASA HDF data via OPeNDAP
 
MATLAB's HDF5 Updates
MATLAB's HDF5 UpdatesMATLAB's HDF5 Updates
MATLAB's HDF5 Updates
 
EOSDIS Status
EOSDIS StatusEOSDIS Status
EOSDIS Status
 
Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...
Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...
Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...
 
HDF OPeNDAP project update and demo
HDF OPeNDAP project update and demoHDF OPeNDAP project update and demo
HDF OPeNDAP project update and demo
 
Status of HDF-EOS, Related Software, and Tools
Status of HDF-EOS, Related Software, and ToolsStatus of HDF-EOS, Related Software, and Tools
Status of HDF-EOS, Related Software, and Tools
 
HDF5 Tools
HDF5 ToolsHDF5 Tools
HDF5 Tools
 
HDF Tools Updates and Discussions
HDF Tools Updates and DiscussionsHDF Tools Updates and Discussions
HDF Tools Updates and Discussions
 
HDF-VFS
HDF-VFSHDF-VFS
HDF-VFS
 
HDF-EOS to GeoTIFF Conversion Tool and HDF-EOS Plug-in for HDFView
HDF-EOS to GeoTIFF Conversion Tool and HDF-EOS Plug-in for HDFViewHDF-EOS to GeoTIFF Conversion Tool and HDF-EOS Plug-in for HDFView
HDF-EOS to GeoTIFF Conversion Tool and HDF-EOS Plug-in for HDFView
 
HDF Tools Tutorial
HDF Tools TutorialHDF Tools Tutorial
HDF Tools Tutorial
 
Status of HDF-EOS, Related Software and Tools
 Status of HDF-EOS, Related Software and Tools Status of HDF-EOS, Related Software and Tools
Status of HDF-EOS, Related Software and Tools
 

Similar a Improving long-term preservation of EOS data by independently mapping HDF4 data objects

Similar a Improving long-term preservation of EOS data by independently mapping HDF4 data objects (20)

HDF Update
HDF UpdateHDF Update
HDF Update
 
Easy Remote Access Via OPeNDAP
Easy Remote Access Via OPeNDAPEasy Remote Access Via OPeNDAP
Easy Remote Access Via OPeNDAP
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF Tools Tutorial
HDF Tools TutorialHDF Tools Tutorial
HDF Tools Tutorial
 
HDF Status and Development
HDF Status and DevelopmentHDF Status and Development
HDF Status and Development
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF-EOS Workshop II Introduction
HDF-EOS Workshop II IntroductionHDF-EOS Workshop II Introduction
HDF-EOS Workshop II Introduction
 
Introduction to HDF5 Data and Programming Models
Introduction to HDF5 Data and Programming ModelsIntroduction to HDF5 Data and Programming Models
Introduction to HDF5 Data and Programming Models
 
HDF5 Tools Updates
HDF5 Tools UpdatesHDF5 Tools Updates
HDF5 Tools Updates
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF Updae
HDF UpdaeHDF Updae
HDF Updae
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF Project Status and Plans
HDF Project Status and PlansHDF Project Status and Plans
HDF Project Status and Plans
 
HDF-EOS Subsetting: HEW and other tools
HDF-EOS Subsetting: HEW and other toolsHDF-EOS Subsetting: HEW and other tools
HDF-EOS Subsetting: HEW and other tools
 
HDF-EOS Software Developer/Vendor Workshop Wrapup
HDF-EOS Software Developer/Vendor Workshop WrapupHDF-EOS Software Developer/Vendor Workshop Wrapup
HDF-EOS Software Developer/Vendor Workshop Wrapup
 
Support for NPP/NPOESS by The HDF Group
Support for NPP/NPOESS by The HDF GroupSupport for NPP/NPOESS by The HDF Group
Support for NPP/NPOESS by The HDF Group
 
HDF5 and The HDF Group
HDF5 and The HDF GroupHDF5 and The HDF Group
HDF5 and The HDF Group
 
Migrating from HDF5 1.6 to 1.8
Migrating from HDF5 1.6 to 1.8Migrating from HDF5 1.6 to 1.8
Migrating from HDF5 1.6 to 1.8
 

Más de The HDF-EOS Tools and Information Center

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 

Más de The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 

Último

.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptxHansamali Gamage
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfTejal81
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Libraryshyamraj55
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIVijayananda Mohire
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024Brian Pichman
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3DianaGray10
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2DianaGray10
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTopCSSGallery
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FESTBillieHyde
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 

Último (20)

.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
 
SheDev 2024
SheDev 2024SheDev 2024
SheDev 2024
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development Companies
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FEST
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 

Improving long-term preservation of EOS data by independently mapping HDF4 data objects

  • 1. The HDF Group Improving long-term preservation of EOS data by independently mapping HDF4 data objects Mike Folk, Ruth Aydt, Joe Lee, Binh-Minh Ribler, Kent Yang Ruth Duerr, Christopher Lynnes The 14th HDF and HDF-EOS Workshop September 28-30, 2010 September 28-30, 2010 HDF/HDF-EOS Workshop XIV 1 www.hdfgroup.org
  • 2. Mapping project team members The HDF Group • • • • • • • • • • Ruth Aydt Peter Cao Mike Folk Joe Lee Elena Pourmal Tong Qi Binh-Minh Ribler Eunsoo Seo Veer Singh Muqun {Kent} Yang September 28-30, 2010 NASA • Ruth Duerr (NSIDC) • Chris Lynnes (GESDISC) HDF/HDF-EOS Workshop XIV 2 www.hdfgroup.org
  • 3. HDF4 files are complex September 28-30, 2010 HDF/HDF-EOS Workshop XIV 3 www.hdfgroup.org
  • 4. How do HDF users avoid having to deal with all of that complexity? September 28-30, 2010 HDF/HDF-EOS Workshop XIV 4 www.hdfgroup.org
  • 5. Through the HDF software libraries, either by using HDF APIs directly, or by using HDF tools that depend on the HDF libraries. But what about the future… September 28-30, 2010 HDF/HDF-EOS Workshop XIV 5 www.hdfgroup.org
  • 6. Over the long term, there is a risk in depending solely on HDF software to access HDFformatted data. It is possible in the distant future, that the software may not be available. September 28-30, 2010 HDF/HDF-EOS Workshop XIV 6 www.hdfgroup.org
  • 7. “If only we could read HDF data with an independent program that does not rely on the HDF API… A possible approach [would be to create] a map of a data file, [and] utilities to find, assemble and write out SDSes and vdatas.” “Leveraging HDF Utilities” Christopher Lynnes HDF Workshop X. September 28-30, 2010 HDF/HDF-EOS Workshop XIV 7 www.hdfgroup.org
  • 8. User’s view of the HDF4 SD model September 28-30, 2010 HDF/HDF-EOS Workshop XIV 8 www.hdfgroup.org
  • 9. Mapping SDS to file offset/length HDF4 file layout September 28-30, 2010 HDF/HDF-EOS Workshop XIV 9 www.hdfgroup.org
  • 10. Mapping with compressed chunks HDF4 file layout September 28-30, 2010 HDF/HDF-EOS Workshop XIV 10 www.hdfgroup.org
  • 11. Recap • Problem • The complex byte layout of HDF files makes long-term readability of HDF data dependent on long-term availability of HDF software. • Solution • Create a map of the layout of data objects in an HDF file, allowing a simple reader to be written to access the data. September 28-30, 2010 HDF/HDF-EOS Workshop XIV 11 www.hdfgroup.org
  • 12. HDF4 mapping workflow HDF4 File hmap linked with HDF4 library HDF4 Mapping File (XML document) Groups, Data Objects, Structural and Application Metadata; Locations of Object Data Object Data Reader program September 28-30, 2010 HDF/HDF-EOS Workshop XIV 12 www.hdfgroup.org
  • 13. Target User • • • • Person 20+ years in the future Interested in data stored in HDF4 file Has HDF4 file and companion map file Can “write a program” • May not have: • HDF4 data model, format, documentation, or software • Mapping schema, documentation, or software • Will have knowledge of: • Basic XML • Data representations used today • Compression used by HDF4 (JPEG, Szip, etc.) September 28-30, 2010 HDF/HDF-EOS Workshop XIV 13 www.hdfgroup.org
  • 14. Project Phases • Phase 1 • Categorize HDF4 data held by NASA. • Build a prototype • XML layout representation • Tool to create XML map file for given HDF4 file • Tools to read HDF4 data based solely on map files • Phase 2 • Build a robust version • Deploy September 28-30, 2010 HDF/HDF-EOS Workshop XIV 14 www.hdfgroup.org
  • 15. How many HDF4 products? Data Center HDF4 Products ASF 0 GES-DISC GHRC 54 ASDC 63 LP-DAAC 67 NSIDC 47 ORNL-DAAC 2 PO.DAAC 22 SDAC 0 MrDC 95 Total September 28-30, 2010 236 586 HDF/HDF-EOS Workshop XIV 15 www.hdfgroup.org
  • 16. Data characteristics Product Characteristics Examined • For SDS data • Product Identification • Number of SDSs • Product Name • Max number of dimensions • Data Level • Did any SDS have attributes • Archive Location • Was any SDS annotated • For HDF-EOS products • HDF-EOS version • For swath data • Number of swaths • Maximum number of dimensions • Organized by time, space, both, or other • Etc. September 28-30, 2010 • Were dimension scales used • Was compression used and if so what kind • Was chunking used • For Vdata • Number of Vdata structures • Did any have attributes • Did any fields have attributes • Etc. HDF/HDF-EOS Workshop XIV 16 www.hdfgroup.org
  • 17. Phase 2 tasks A. Investigate integration of mapping schema with existing standards B. Determine HDF-EOS 2 requirements C. Redesign and expand the XML schema D. Implement production quality map writer E. Develop demo map reader F. Deploy tools at select NASA data centers September 28-30, 2010 HDF/HDF-EOS Workshop XIV 17 www.hdfgroup.org
  • 18. The HDF Group Task A Investigate integration of mapping schema with existing standards September 28-30, 2010 HDF/HDF-EOS Workshop XIV 18 www.hdfgroup.org
  • 19. Investigate existing standards • Investigated: • METS, PREMIS, ESML, NcML, and CSML • Concluded: • Existing standards have different purposes than mapping schema • None meet all needs of mapping project • Develop new schema tailored to project goals • Harmonize with PREMIS • Leverage terminology and approaches from all September 28-30, 2010 HDF/HDF-EOS Workshop XIV 19 www.hdfgroup.org
  • 20. The HDF Group Task B Determine HDF-EOS2 requirements September 28-30, 2010 HDF/HDF-EOS Workshop XIV 20 www.hdfgroup.org
  • 21. Categorize HDF-EOS2 data products • Created a data pool from NASA data centers • GES DISC, NSIDC, LAADS, LP DAAC • LaRC, PO.DAAC, GHRC, OBPG, LAADS • Detailed description of sample data • Reported options for adding HDF-EOS2 contents to the mapping file • Documents and reports at wiki: http://wiki.hdfgroup.org/MappingPhase2_TaskB September 28-30, 2010 HDF/HDF-EOS Workshop XIV 21 www.hdfgroup.org
  • 22. The HDF Group Task C Redesign Schema September 28-30, 2010 HDF/HDF-EOS Workshop XIV 22 www.hdfgroup.org
  • 23. Design priorities • Mapping files • Provide complete access to user-supplied content in NASA’s EOS binary HDF4 files • Have enough information to stand on their own • Be as simple as possible • Mapping schema • Describe the Mapping files • Used for validation and documentation • May not be available to target user September 28-30, 2010 HDF/HDF-EOS Workshop XIV 23 www.hdfgroup.org
  • 24. Representation of HDF4 Objects HDF4 User-Level Object Mapping File XML Element Attribute, Annotation Attribute Vgroup Group Vdata Table SDS Array Dimension Dimension Raster Image Not yet done Palette Not yet done September 28-30, 2010 HDF/HDF-EOS Workshop XIV 24 www.hdfgroup.org
  • 25. Mapping File – Group & Table (fragment) Select raw data Information needed Represents HDF4 values included to to access and Objects and help user verify in interpret raw data Relationships binary data handled HDF4 file properly AMSR_E_L2_Land_V09_200501180027_D September 28-30, 2010 HDF/HDF-EOS Workshop XIV 25 www.hdfgroup.org
  • 26. Status and Plans • Status • Map file design stabilizing for most HDF4 objects • Plans • Complete design for Raster Images and Palettes • Continue to refine instructions and contents • Finalize schema September 28-30, 2010 HDF/HDF-EOS Workshop XIV 26 www.hdfgroup.org
  • 27. The HDF Group Task D Implement Writer September 28-30, 2010 HDF/HDF-EOS Workshop XIV 27 www.hdfgroup.org
  • 28. Map Writer Requirements • Retrieve information needed from HDF4 file • Write out corresponding XML file • Quality requirements • Completeness – don’t miss any objects in file. • Accuracy – don’t give wrong information. September 28-30, 2010 HDF/HDF-EOS Workshop XIV 28 www.hdfgroup.org
  • 29. Writer Status and Plan • Status • Covers most Vgroup/Vdata/SDS objects. • Covers some GR/Annotation objects. • Being tested with NASA data. • Plans: • Increase coverage / accuracy / reliability. September 28-30, 2010 HDF/HDF-EOS Workshop XIV 29 www.hdfgroup.org
  • 30. The HDF Group Task E Implement demo reader September 28-30, 2010 HDF/HDF-EOS Workshop XIV 30 www.hdfgroup.org
  • 31. Demo Reader Requirements • Multiplatform command line tool • Easy to use clear arguments and output • Must validate that objects in the mapping file are actually in the HDF4 file • Developed in a well-supported high level language (python) • Well documented • Available as open source September 28-30, 2010 HDF/HDF-EOS Workshop XIV 31 www.hdfgroup.org
  • 32. Demo Reader Status • Status • Only Vdata support provided so far • Current source code available at https://sourceforge.net/projects/pyhdf • Documentation at http://pyhdf.sourceforge.net/ • Plans • SDS and RIS support September 28-30, 2010 HDF/HDF-EOS Workshop XIV 32 www.hdfgroup.org
  • 33. The HDF Group Task G Deploy September 28-30, 2010 HDF/HDF-EOS Workshop XIV 33 www.hdfgroup.org
  • 34. Deploy • Begin in Jan 2011, complete in April • Activities: • GES DISC • Incorporate into the existing archive ingest system • Manage the retrofit into existing metadata files • NSIDC • Support implementation in NSIDC’s ECS system • Other ESDCs • Encouraged to join in • But deployment to other centers expected subsequent to the project. September 28-30, 2010 HDF/HDF-EOS Workshop XIV 34 www.hdfgroup.org
  • 35. The HDF Group Thank You! September 28-30, 2010 HDF/HDF-EOS Workshop XIV 35 www.hdfgroup.org
  • 36. Acknowledgements This work was supported by cooperative agreement number NNX08AO77A from the National Aeronautics and Space Administration (NASA). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author[s] and do not necessarily reflect the views of the National Aeronautics and Space Administration. September 28-30, 2010 HDF/HDF-EOS Workshop XIV 36 www.hdfgroup.org
  • 37. The HDF Group Questions/comments? September 28-30, 2010 HDF/HDF-EOS Workshop XIV 37 www.hdfgroup.org
  • 38. September 28-30, 2010 HDF/HDF-EOS Workshop XIV 38 www.hdfgroup.org
  • 39. Extra slides September 28-30, 2010 HDF/HDF-EOS Workshop XIV 39 www.hdfgroup.org

Notas del editor

  1. Full quote, from proposal:Through the HDF software libraries, either by using the HDF APIs directly or by using HDF tools that depend on the HDF libraries. However there is a risk in depending solely on the HDF libraries to access HDF-formatted data over the long term. It is possible, especially in the distant future, that the libraries may not be as readily available as they are today. To address this risk, it is desirable to have a way to retrieve the data independently.At the 10th HDF workshop, Christopher Lynnes of the Goddard Earth Sciences Data and Information Services Center(GES DISC) addressed this need: “If only we could read HDF data with an independent program that does not rely on the HDF API… A possible approach [would be to] extend” hdfls to print a hierarchical map of a data file, [and] write ncdump/hdp-like utilities to find, assemble and write out SDSes and vdatas.” “Leveraging HDF Utilities,” Christopher Lynnes, 10th HDF Workshop. http://www.hdfeos.org/workshops/ws10/presentations/day3/Leveraging_HDF_Utilities.ppt.
  2. TheHDF4 Mapping Schema describes an XML Document that provides access to content originally stored in a binary HDF4 file.The HDF4 Mapping Schema is defined by one or more XML schema documents written in the XML Schema Definition Language, XSDL.An HDF4 Mapping File is an XML Document that conforms to the HDF4 Mapping Schema.Data representations used today: twos-complement, IEEE floating point, big/little endian
  3. METS = Metadata Encoding and Transmission Standard; a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital libraryPREMIS = PREservation Metadata: Implementation Standard; The PREMIS Data Dictionary defines a core set of semantic units that repositories should know in order to perform their preservation functions. Format-specific metadata is excluded as out of scope.ESML = Earth Science Markup LanguageNcML = NetCDF Markup Language [Schema used with Common Data Model (CDM) datasets]CSML = Climate Science Modelling Language
  4. AMSR_E_L2_Land_V09_200501180027_D
  5. AIRS.2002.08.31.L3.RetStd_H001.v5.0.14.0.G07178195754
  6. Test file created for project