NetCDF-Java is an open source Java library for reading scientific data formats like NetCDF, HDF5, HDF4, and OPeNDAP. It has been used as a component in many software projects. The library provides an object-oriented API for reading data from these file formats and exposing it to Java programs. It works by providing format readers for specific file types that can read data into the Common Data Model used by the library. The library has been tested against many file examples but could benefit from more systematic testing. Proper use of dimensions, variables, units, and metadata is important for self-documenting scientific data files.
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Reading HDF family of formats via NetCDF-Java / CDM
1. Reading HDF family of formats
via NetCDF-Java / CDM
John Caron
UCAR/Unidata
2. NetCDF-Java library
•
•
•
•
100% Java
Open Source (LGPL, MIT)
Independent implementation
Used as a component in other software (partial)
–
–
–
–
–
–
–
–
–
Integrated Data Viewer, THREDDS Data Server (Unidata)
Panoply (NASA)
ncBrowse (EPIC/NOAA)
Java NEXRAD Viewer (NCDC/NOAA)
MyWorld GIS (Northwestern)
EDC for ArcGIS, ERRDAP (SFSC/NOAA)
Live Access Server (PMEL/NOAA)
ncWMS (Reading)
Matlab plug-in (USGS)
5. Line of Code (est)
LOC
netcdf3
hdf4
hdf-eos
hdf5
common
1977
3151
3737
5735
28121
semicolons ratio LOC ratio semi
846
1
1
1405
1.6
1.7
1695
1.9
2.0
2672
2.9
3.2
9267
6. Why all the trouble?
• ~20-40% C/C++ time spent on portability issues
• Platform Independence
–
–
–
–
Linux, Solaris, Windows (Sun)
Mac OS X (Apple)
AIX, Linux, Windows, z/OS (IBM)
HP-UX (Hewlitt-Packard)
• Progammer productivity
–
–
–
–
Object-Oriented
Garbage Collected – no memory leaks
Rich libraries
Open source
• Faster than C for some applications
7. Independent implementation
• Written entirely from reading HDF4, HDF5
file specifications
• Helped debug (HDF5), validate file specs
• File format spec is what will be needed in
100 years to read legacy data
– OTOH, semantics not always obvious
• Don’t confuse reference implementation
with the file/protocol specification
8. HDF family of formats
• HDF5/NetCDF-4
• HDF4
• HDF-EOS
• Note: read-only, no parellel I/O, etc
9. HDF5/NetCDF4
• Goal is to read all HDF5
– Can read all HDF5 files that we have example
– including references, soft links
– Complete coverage difficult to guarantee –
combinatoric explosion
• Some esoteric features we are skipping
– File drivers, external files, slib compression
• Working on a comprehensive test harness
– JNI interface to Netcdf4/HDF5 library
– read every byte and compare
10. HDF4 / HDF-EOS
• Complete, works against all examples
• Tested against 400 sample files (27 Gb)
– thanks to Ruth Duerr (NSIDC)
• Spot checked against HDFView
• Need systematic test to compare reading
against the HDF4 C Library
16. If you write data
•
•
•
•
Don’t rely on variable name conventions
Don’t rely on index ordering
Don’t rely on matching index sizes
Minimize “you just have to know that…”
19. If you write data
•
•
•
•
•
Unique signature
Specify dimensions
Identify georeferencing coordinates
Identify data type
Units are not optional
20. HDF-EOS, HDF-EOS2
• Read “structural metadata” field to obtain
more semantics
• Parse text in “ODL”
– Data type: Swath, Grid, Point
– Dimensions
– Geolocation coordinate variable types:
Latitude, Longitude, Time
21. HDF-EOS, HDF-EOS2
• Good
– Unique signature, identify coordinates and
data type
• Not so good
– ODL
– Not using hdf4/5 constructs
• Bad
– No data units
– No time coordinate units!
23. NPP (i1.4.0.3_NPP_QUAL)
• Good
– XML better than ODL
• Not so good
– Not using hdf4/5 constructs
• Bad
– No data units
– No time coordinate units!
• Fatal Error: please reboot
– Metadata not in the same file
24. Summary
• Netcdf-Java reads entire HDFx family
• Good for Java-philes
• Needs more testing
– Send example files, $
• Dimensions are not optional
• Keep structural and georeferncing
metadata in the same file as the data
– Can also have specialized external files