HDF is a file format for managing scientific data in heterogeneous environments. It provides data interoperability through I/O software, utilities, and search/access tools. HDF supports a variety of data types and structures, large datasets, metadata, portability across systems, fast I/O, and efficient storage. HDF-EOS extends HDF to define standard profiles for organizing Earth science remote sensing and in-situ data.
HDF Workshop II Overview: Introduction to HDF File Format
1. HDF
HDF-EOS Workshop II
Sept. 22, 1998
Mike Folk,HDF Group
http://hdf.ncsa.uiuc.edu/
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
NCSA/Univ of Illinois at Urbana-Champaign
HDF
1
2. I. Overview
•
•
•
•
What is HDF?
HDF software
HDF objects
Who uses HDF?
NCSA/Univ of Illinois at Urbana-Champaign
HDF
2
3. What is HDF?
• A data file format for managing scientific data
in heterogeneous environments
• I/O software, utilities, search and access tools
• A standard used widely among scientists
• Enabling data interoperability since 1988
NCSA/Univ of Illinois at Urbana-Champaign
HDF
3
4. Requirements for scientific data
• “Scientific” data
– A variety of data types and structures
– Large structures
– Metadata in a variety of forms
• Portability
– I/O library works on many machines
– Data easily moved from machine to machine
• Fast I/O
• Efficient storage
NCSA/Univ of Illinois at Urbana-Champaign
HDF
4
5. Why use HDF?
• Share scientific data
– in heterogeneous computing environments
– with others of like interests
•
•
•
•
To use software that understands HDF
To improve I/O performance
To improve storage efficiency
To use an open standard
NCSA/Univ of Illinois at Urbana-Champaign
University of Illinois at Urbana-Champaign
HDF
5
6. HDF Object Types
8-bit raster
Palette
24-bit raster
This HDF file contains
one example of each
object type
Annotation
Vgroup
March 15, 1990. Simulation with
Scientific Data Sets (SDS)
(multidimensional arrays)
lat | lon | temp
----|-----|----12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6
Vdata (tables)
k=10.0, beta=1.22e3. Calculate
the magnitude ...
NCSA/Univ of Illinois at Urbana-Champaign
HDF
6
7. An HDF File: A Collection of
Scientific Data Objects
HDF file containing four 3-D arrays
NCSA/Univ of Illinois at Urbana-Champaign
HDF
7
8. Mixing HDF Objects in One File
3-D array
group
Raster image
palette
HDF file
March 15, 1990. Simulation with
Raster
image
k=10.0, beta=1.22e3. Calculate
the magnitude ...
2-D array
annotation
NCSA/Univ of Illinois at Urbana-Champaign
HDF
8
9. The HDF File Structure
Primitive Data Object
Data descriptor (DD):
Data:
tag
tag
ref
ref
offset
offset
length
length
(12 bytes)
data element
data element
(Everything else in an HDF file is build out of these.)
NCSA/Univ of Illinois at Urbana-Champaign
HDF
9
10. Example of hierarchical HDF structure
Raster Image Group (original format)
Index:
Blk Next
size blk
RIG
tag
4 offst len
palette
3 offst len
image
3 offst len
dim's
2 offst len
ref
Data elements:
400 x 600
palette 3
image 3
dim's 2
tag
ref
NCSA/Univ of Illinois at Urbana-Champaign
HDF
10
11. Other features of interest
• Data compression
• The data part of an object can be stored in
“special” ways
– As a series of linked-blocks
– As a portion of an external file
– In chunks, or tiles
NCSA/Univ of Illinois at Urbana-Champaign
HDF
11
13. HDF Applications Software
• Free software
– NCSA HDF library and utilities
– Other software
• Commercial/other software that “understands”
– all of HDF (Noesys, IDL)
– certain HDF objects (MATLAB, AVS)
– certain HDF applications (SHARP, WIM)
• http://hdf.ncsa.uiuc.edu/tools.html
NCSA/Univ of Illinois at Urbana-Champaign
HDF
13
14. What platforms does HDF run on?
• Sun: Solaris, SunOS
• SGI: Indy, Power Challenge, Origin, Cray C90,
YMP, T3E
• HP9000, HP-Convex Exemplar
• IBM: RS6000, SP2
• DEC: Alpha/Digital UNIX, OpenVMS
VAX: OpenVMS
• Intel Pentium: Solarisx86, Linux, FreeBSD 2.2,
Windows: NT/95
• PowerPC: Mac-OS
NCSA/Univ of Illinois at Urbana-Champaign
University of Illinois at Urbana-Champaign
HDF
14
15. HDF Objects
8-bit raster
Palette
24-bit raster
This HDF file contains
one example of each
object type
Annotation
Vgroup
March 15, 1990. Simulation with
Scientific Data Sets (SDS
(multidimensional arrays
lat | lon | temp
----|-----|----12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6
Vdata (tables)
k=10.0, beta=1.22e3. Calculate
the magnitude ...
NCSA/Univ of Illinois at Urbana-Champaign
HDF
15
16. SDS: Scientific Data Set
Scientific Data Set
Array
Array Number
Type
Dimension X
Dimension Y
Dimension Z
SDS attributes
Dimension
Attributes
Dimension
Attributes
Dimension
Number Type
Dimension
Number Type
Dimension
Number Type
Dimension
Scale
Dimension X,Y,Z
Dimension
Attributes
Dimension
Scale
Dimension
Scale
NCSA/Univ of Illinois at Urbana-Champaign
HDF
16
17. Writing an SDS
• Open a file and start SD interface
Sdstart
• Create or open an SDS
SDcreate/Sdselect
• Write out the SDS
Sdwritedata
• Terminate access to the SDS
SDendaccess
• Terminate access to SD interface
and close the file
SDend
NCSA/Univ of Illinois at Urbana-Champaign
HDF
17
18. Vdata: HDF table structure
General vdata
Vdata Name
Class
Class_1
Field_1
Field_2
Field_3
6
6.93
2
7
1.5
3.8
23.50
12.30
5
3.5
1.22
8
Records
5.3
2.6
Field Names
0.00
Fields
NCSA/Univ of Illinois at Urbana-Champaign
HDF
18
19. Vgroup: HDF Grouping Structure
Vgroup
Vgroup
Vgroup
Data
Data
Directory
Directory
Directory
Vgroup
Data
Data
Data
Vgroup System Organization
File
File
Directory
File
File
File
UNIX File System Organization
NCSA/Univ of Illinois at Urbana-Champaign
HDF
19
20. Vgroup
Vgroup
March 15, 1990. Simulation with
k=10.0, beta=1.22e3. Calculate
the magnitude ...
annotation
Raster
image
2-D array
NCSA/Univ of Illinois at Urbana-Champaign
HDF
20
21. GR: HDF general raster image
General Raster
Image Array
Image Attribute
Palette
name = IMAGE_1
Optional
Array Name
Pixel Type
Width
Height
Required
NCSA/Univ of Illinois at Urbana-Champaign
HDF
21
22. HDF Palette
Entry
red
green
blue
0 00000000 00000000 00000000
1 00000001 00000001 00000001
2 00000010 00000010 00000010
...
.
.
.
192 11000000 11000000 11000000
.
.
.
253 11111101 11111101 11111101
254 11111110 11111110 11111110
255 11111111 11111111 11111111
Palette
Color Look-up Table (Color Components)
8-bit Raster Image Pixel
NCSA/Univ of Illinois at Urbana-Champaign
HDF
22
23. HDF Annotations
HDF File
“This is a file label.”
“This is a file description.”
“This is another file label.”
“This is another file description.”
File Annotations
SDS
"This is an SDS label."
RIS24
RIS8
"This is RIS24 description."
"This is a RIS8 description."
Object Annotations
NCSA/Univ of Illinois at Urbana-Champaign
HDF
23
25. A Sampling of HDF Users
NCSA-affiliated Science teams
Visualization, exchange, archiving
Vendors: Mathworks, Fortner
Software, RSI, etc.
Visualization, data analysis, exchange,
data management
TRAPPIST
(Euro consortium)
Exchange, analysis, visualization of
non-destructive testing data (STEP)
NIST Reactor & Cold Neutron
Res Facility
Data exchange, archiving
Stanford Univ EE Dept
Persistent storage of simulation data
Johns Hopkins Appl Physics Lab Planetary data exchange, archive
Inst Adv Tech in the Humanities Image annotation
Comp Graphics Unit Manchester Image processing, visualization
NCSA/Univ of Illinois at Urbana-Champaign
HDF
25
26. Major User #1: EOSDIS
•
•
•
•
User support for scientists, data producers, etc.
Library and file structure improvements
HDF tools, utilities, access software
Software maintenance and QA
NCSA/Univ of Illinois at Urbana-Champaign
HDF
26
27. Major User #2: ASCI
• ASCI Data Models and Formats (DMF) Group
– open standard exchange format and I/O library for ASCI
– DOE tri-lab ASCI applications
• HDF requirements
–
–
–
–
large datasets (> a terabyte)
ASCI data types, especially meshes
good performance in massive parallel environments
emphasis on HDF 5
NCSA/Univ of Illinois at Urbana-Champaign
HDF
27
28. II. Major HDF Projects
•
•
•
•
Support, maintenance, QA
Java applications
Remote access
HDF5: next generation
NCSA/Univ of Illinois at Urbana-Champaign
HDF
28
29. Support, Maintenance, QA
•
•
•
•
•
User support
Library and file structure improvements
Software maintenance and QA
Documentation, tutorials, etc.
HDF tools and utilities
NCSA/Univ of Illinois at Urbana-Champaign
HDF
29
30. Java applications
• A Java HDF API
– Basis for tools that access HDF
• A Java HDF Viewer
– HDF browser/visualizer
• Java Scientific Data Server Prototype
– Lessons learned about scientific data servers
NCSA/Univ of Illinois at Urbana-Champaign
HDF
30
31. Java HDF API (JHI)
• Java HDF Interface
–
–
–
–
Complete Java interface to the HDF library
Analogous to the HDF F77 interface
To be used by Java apps to access HDF files
http://hdf.ncsa.uiuc.edu/java-hdf-html/
NCSA/Univ of Illinois at Urbana-Champaign
HDF
31
32. Java-based HDF Viewer (JHV)
• client-based HDF viewer
–
–
–
–
–
–
browsing
data viewing (“spreadsheet”)
image viewing and plotting
image processing
animation
data import/export
NCSA/Univ of Illinois at Urbana-Champaign
HDF
32
34. Remote Data Access
• Accessing Scientific Info Using Networks
– Scientific data is large and complex,
– Need to locate objects & subsets within large files
– The “download and browse” approach is infeasible
NCSA/Univ of Illinois at Urbana-Champaign
HDF
34
35. The SDB: Web-based Server-side Data
Browser
• Remote interactive browser for data and
metadata
–
–
–
–
–
–
“A conversation with the data”
Views provided at increasing levels of detail
Returns subsets of data
Performs simple data translations
Adaptable to different domains (EOS, Astronomy)
Reads HDF, netCDF, CDF, FITs formats
NCSA/Univ of Illinois at Urbana-Champaign
HDF
35
36. Sample SDB applications
• DIAL (Data & Info Access Link)
– HDF-EOS data access
• Distrib. Ocean Data System (DODS) WP-ESIP
– High-speed remote subsetting earth science datasets
• Boeing
– Extract space/time info from large image collections
NCSA/Univ of Illinois at Urbana-Champaign
HDF
36
37. Other work
• Java applets for scientific data browsing
• A VRML server for Radio Astronomy data
• Interoperable search with Z39.50
NCSA/Univ of Illinois at Urbana-Champaign
HDF
37
38. HDF5: Next generation HDF
• Features
–
–
–
–
Large arrays and files (>2GB)
Simple, comprehensive data model
Emphasis on Parallel I/O
Alternate storage structures
• Collaborations
– Mesh data standard for ASCI physics
– Integrate with commercial object store (DLI)
NCSA/Univ of Illinois at Urbana-Champaign
HDF
38
39. Basic HDF5 data object
Dimensionality: 5 x 3 x 4
Number type:
int8
int4
int16
Record
HDF
float32
NCSA/Univ of Illinois at Urbana-Champaign
39
40. III. HDF and HDF-EOS
• HDF profiles and HDF-EOS
• HDF configuration record project
• Future directions?
NCSA/Univ of Illinois at Urbana-Champaign
HDF
40
41. HDF Profiles and HDF-EOS
• HDF does not support specific application areas.
• To share files, users must agree on how to
organize them.
• HDF user groups create “profiles” describing
how to organize data in their HDF files.
• Profiles standardize domain-specific HDF files
• Example: HDF-EOS
NCSA/Univ of Illinois at Urbana-Champaign
HDF
41
42. HDF-EOS profiles
• Profiles for Earth remote sensing data and insitu measurements
• Includes
– Standard metadata for EOS data
– API and library that reads/writes HDF-EOS files
– Utilities to simplify the work of analyzing and
visualizing HDF-EOS files.
NCSA/Univ of Illinois at Urbana-Champaign
HDF
42
43. HDF-EOS software layers
HDF-EOS Applications
HDF-EOS
profiles
General Applications
HDF-EOS API
Application
Programming
Interfaces
Low-level
Interface
HDF
file
NCSA/Univ of Illinois at Urbana-Champaign
HDF
43
44. “HDF Configuration Record” (HCR)
To simplify the tasks of defining,
comparing, and producing
HDF-EOS files
NCSA/Univ of Illinois at Urbana-Champaign
HDF
44
45. HCR
• Formal descriptions of HDF objects
• Based on ODL (Object Description Language)
• Supports HDF-EOS swath, grid, point
NCSA/Univ of Illinois at Urbana-Champaign
HDF
45
46. HCR of Swath
/* Project XYZ */
/* First version defined on June 10th, 1998 */
OBJECT = SWATH
NAME = SCAN1
OBJECT = Dimension
NAME = GeoTrack
Size = 1200
END_OBJECT = Dimension
OBJECT = Dimension
NAME = GeoCrossTrack
Size = 205
END_OBJECT = Dimension
OBJECT = Dimension
NAME = DataX
Size = 2410
END_OBJECT = Dimension
END_OBJECT = SWATH
END
NCSA/Univ of Illinois at Urbana-Champaign
HDF
46
47. HCR Utilities:
• Convert HCR ↔ HDF-EOS
• Edit HCR and HDF-EOS
• Compare HCR with HDF-EOS file
NCSA/Univ of Illinois at Urbana-Champaign
HDF
47
50. Future possibilities
•
•
•
•
Deploy HCR?
HDF as an archive format?
Java tools for HDF-EOS?
HDF5?
– HDF5 ↔ HDF4 conversion
– HDF4 API on HDF5
– HDF-EOS on HDF5
NCSA/Univ of Illinois at Urbana-Champaign
HDF
50
51. HDF Information
• HDF Information Center
– http://hdf.ncsa.uiuc.edu/
• HDF Help email address
– hdfhelp@ncsa.uiuc.edu
• HDF users mailing list
– hdfnews@ncsa.uiuc.edu
NCSA/Univ of Illinois at Urbana-Champaign
HDF
51