SlideShare una empresa de Scribd logo
1 de 52
HDF
HDF/HDF-EOS Workshop III
Sept. 14-16, 1999
Mike Folk, HDF Group
http://hdf.ncsa.uiuc.edu/
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
NCSA/Univ of Illinois at Urbana-Champaign

HDF

1
Topics

I. Overview
II. NCSA HDF Activities
III. HDF5
IV. HDF4 vs. HDF5

NCSA/Univ of Illinois at Urbana-Champaign

HDF

2
I. HDF Overview

NCSA/Univ of Illinois at Urbana-Champaign

HDF

3
HDF Mission

To develop, promote, deploy, and support
open and free technologies that facilitate
scientific data storage, exchange, access,
analysis and discovery.

NCSA/Univ of Illinois at Urbana-Champaign

HDF

4
What is HDF?
• Scientific data file format & supporting
software
• For images, arrays, tables, other structures
• Features
– Portability across architectures
• I/O library
• Files

– Efficient I/O
– Efficient storage
HDF

NCSA/Univ of Illinois at Urbana-Champaign

5
Why use HDF?
•
•
•
•
•
•

Manage data
Share data
Use software that understands HDF
Improve I/O performance
Improve storage efficiency
Use an open standard

NCSA/Univ of Illinois at Urbana-Champaign

HDF

6
An HDF File: A Collection of
Scientific Data Objects

HDF file containing four 3-D arrays

NCSA/Univ of Illinois at Urbana-Champaign

HDF

7
Mixing HDF Objects in One File

3-D array

group

Raster image

palette

HDF file
3-D array

Raster
image

Lat lon temp
---- ---- ----12
23
3.1
15
24
4.2
17
21
3.6
16
35
5.7

Table
NCSA/Univ of Illinois at Urbana-Champaign

HDF

8
HDF Software
Utilities and applications for
manipulating, viewing, and
analyzing data.

General Applications
Application
Programming
Interfaces
Low-level
Interface
HDF
file

}

HDF I/O library
– High-level, object-specific APIs.
– Low-level API for I/O to files, etc.
File or other data source.
NCSA/Univ of Illinois at Urbana-Champaign

HDF

9
HDF Applications Software
• Free software
– NCSA HDF library and utilities
– Other software

• Commercial/other software that “understands”
– all of HDF (Noesys, IDL, HDF Explorer)
– certain HDF objects (MATLAB, WebWinds)
– certain HDF applications (SHARP, WIM)
• http://hdf.ncsa.uiuc.edu/tools.html
NCSA/Univ of Illinois at Urbana-Champaign

HDF

10
What platforms does HDF run on?
• Sun: Solaris
• SGI: Indy, Power Challenge, Origin, Cray C90, YMP, T3E
• HP9000, HP-Convex Exemplar
• IBM: RS6000, SP2
• DEC: Alpha/Digital UNIX, OpenVMS
VAX: OpenVMS
• Intel: Solarisx86, Linux, FreeBSD, Windows NT/98
• PowerPC: Mac-OS

University
NCSA/Univ of Illinois at Urbana-Champaign

HDF

11
A Sampling of HDF Users
NCSA-affiliated Science teams

Visualization, data exch, fast I/O, ...

Mathworks, Fortner Software,
Research Systems Inc., etc.

Format supported by vendors of vis
and data analysis software

Boeing

Space-time change detection in images

Distributed Oceanographic Data
System (DODS)

Remote access to earth science data

Army Research Lab

Network distributed global memory

Center for Analysis & Prediction
of Storms

Fast parallel I/O, portability,
multi-resolution grids

TRAPPIST
(Euro consortium)

Exchange, analysis & visualization of
non-destructive testing data
NCSA/Univ of Illinois at Urbana-Champaign

HDF

12
Major User #1: EOSDIS
• ESDIS Project
– open standard exchange format and I/O library for EOSDIS
– EOS applications

• HDF requirements
–
–
–
–
–

Earth science data types (HDF-EOS, etc,)
User support for scientists, data producers, etc.
Library and file structure improvements
HDF tools, utilities, access software
Software maintenance and QA
NCSA/Univ of Illinois at Urbana-Champaign

HDF

13
Major User #2: ASCI
• ASCI Data Models and Formats (DMF) Group
– open standard exchange format and I/O library for ASCI
– DOE tri-lab ASCI applications

• HDF requirements
–
–
–
–

large datasets (> a terabyte)
ASCI data types, especially meshes
good performance in massive parallel environments
primarily HDF 5

NCSA/Univ of Illinois at Urbana-Champaign

HDF

14
II. NCSA HDF Activities

NCSA/Univ of Illinois at Urbana-Champaign

HDF

15
Java applications
• HDF APIs
– Basis for tools that access HDF

• HDF Viewers
– HDF browser/visualizer

• HDF4 Data Server Prototype
– Lessons learned about remote access to

NCSA/Univ of Illinois at Urbana-Champaign

HDF

16
Remote Data Access

• The SDB: Web-based Server-side Data
Browser
• Java for remote access
• WP-ESIP: DODS project
• Computational Grids (Globus/GASS)

NCSA/Univ of Illinois at Urbana-Champaign

HDF

17
HDF Standardization
• To share files, users must organize them similarly.
• HDF user groups create standard profiles
– Ways to organize data in HDF files.
– Metadata
– API

• Examples: HDF-EOS, ASCI DMF

NCSA/Univ of Illinois at Urbana-Champaign

HDF

18
HDF-EOS software layers
HDF-EOS Applications
HDF-EOS
profiles

General Applications
HDF-EOS API
Application
Programming
Interfaces
Low-level
Interface
HDF
file
NCSA/Univ of Illinois at Urbana-Champaign

HDF

19
“HDF Configuration Record” (HCR)

• To simplify the tasks of defining, comparing,
and producing HDF-EOS files
• Formal (ODL) descriptions of HDF-EOS
objects

NCSA/Univ of Illinois at Urbana-Champaign

HDF

20
HCR of Swath
/* Project XYZ */
/* First version defined on June 10th, 1998 */
OBJECT = SWATH
NAME = SCAN1
OBJECT = Dimension
NAME = GeoTrack
Size = 1200
END_OBJECT = Dimension
OBJECT = Dimension
NAME = GeoCrossTrack
Size = 205
END_OBJECT = Dimension
OBJECT = Dimension
NAME = DataX
Size = 2410
END_OBJECT = Dimension
END_OBJECT = SWATH
END

NCSA/Univ of Illinois at Urbana-Champaign

HDF

21
HCR
• HCR Utilities:
– Converters: HCR ↔ HDF-EOS
– Edit HCR and HDF-EOS
– Compare HCR with HDF-EOS file

• Current projects:
– Extend HCR converters to all of HDF4
– Similar work with HDF5
– XML too
NCSA/Univ of Illinois at Urbana-Champaign

HDF

22
III. HDF5

NCSA/Univ of Illinois at Urbana-Champaign

HDF

23
Why HDF5?
• HDF shortcomings
exposed by EOSDIS, ASCI and others...
–
–
–
–
–

Limits on object & file size (<2GB)
Limited number of of objects (<20K)
Rigid data models
I/O performance
Aging software infrastructure (code entropy)
NCSA/Univ of Illinois at Urbana-Champaign

HDF

24
• …new Demands...
– Bigger, faster machines and storage systems
• massive parallelism, parallel file systems
• teraflop speeds, terabyte storage

– Greater complexity
• complex data structures
• complex subsetting

– More emphasis on remote & distributed access
NCSA/Univ of Illinois at Urbana-Champaign

HDF

25
• … and ASCI Requirements
–
–
–
–

Compatibility with vector bundle model
Compatibility with MPI-IO
Ability to transform data between memory & storage
Parallel file systems: PIOFS, HPSS, etc.

NCSA/Univ of Illinois at Urbana-Champaign

HDF

26
New HDF5 Features
• More scalable
– Larger arrays and files
– More objects

• Improved data model
– New datatypes
– Single comprehensive dataset object

• Improved software
– More flexible, robust library
– More flexible API
– More I/O options
NCSA/Univ of Illinois at Urbana-Champaign

HDF

27
HDF5 data model

• Two primary objects
• Dataset
– multidimensional array of elements
– rich variety of datatypes

• group
– directory-like structure
– contains datasets, groups, other objects
NCSA/Univ of Illinois at Urbana-Champaign

HDF

28
Dataset components
• multidimensional array
• header with metadata
–
–
–
–

datatype
dataspace
attributes
storage properties

NCSA/Univ of Illinois at Urbana-Champaign

HDF

29
Simple datatypes
•
•
•
•
•
•

The usual scalars: integer & float
user-defined scalars (e.g. 13-bit integers)
variable length (e.g. strings)
pointers to objects or regions of datasets
enumeration
opaque

NCSA/Univ of Illinois at Urbana-Champaign

HDF

30
Compound datatypes

•
•
•
•

User-defined
Comparable to C structs
Members can be simple or compound types
Members can be multidimensional

NCSA/Univ of Illinois at Urbana-Champaign

HDF

31
Data Spaces

• How data are organized to form a dataset
– rank
– dimensions

• Subsetting during I/O operations
– What subset of data is to be moved
– In-memory organization of data
– In-file organization of data
NCSA/Univ of Illinois at Urbana-Champaign

HDF

32
HDF5 dataset: array of records
int8

int4

int16

Datatype:

float32

Dimensionality: 5 x 3
Record

3

5

NCSA/Univ of Illinois at Urbana-Champaign

HDF

33
Dataspaces
Reading Dataset into Memory from File
File

Memory

2D array of integers

3D array of floats

Read

NCSA/Univ of Illinois at Urbana-Champaign

HDF

34
Selection: Examples of mappings between file selections
and memory selections.

(a) A hyperslab from a 2D array to the
corner of a smaller 2D array

(c) A sequence of points from a 2D array to
a sequence of points in a 3D array.

(b) A regular series of blocks from a 2D
array to a contiguous sequence at a
certain offset in a 1D array

(d) Union of slabs in file to union of slabs in
memory. No. of elements must be equal.

NCSA/Univ of Illinois at Urbana-Champaign

HDF

35
Attributes
• Named pieces of data
• Stored in a dataset or group header
• Operations are scaled-down versions of the
dataset operations
– Not extendible
– No compression
– No partial I/O
NCSA/Univ of Illinois at Urbana-Champaign

HDF

36
Property list

• Properties of objects or operations
• Describe how to create, store, access and
transfer data

NCSA/Univ of Illinois at Urbana-Champaign

HDF

37
Some Properties
• chunked

Better subsetting
access time;
extendable

• compressed

Improves storage
efficiency,
transmission speed

• extendable

Datasets can be
extended in any
direction

• split file

Dataset “Fred”

File A

HDF

Metadata for Fred

File B

Metadata in one file,
raw data in another.

Data for Fred
NCSA/Univ of Illinois at Urbana-Champaign

38
Dataset components
Dataset
Metadata

Data
Attributes
time = 32.4
pressure = 987
temp = 56

Dataspace
Datatype
Dim_3=2

Rank=2

Dim_2=4
Dim_1=5

int16

Storage properties
Chunked; compressed

NCSA/Univ of Illinois at Urbana-Champaign

HDF

39
Groups

•
•
•
•
•

Structures for organizing the file
Like Vgroups in HDF4
Like directories in hierarchical file system
Every file starts with a root group
Groups have attributes

NCSA/Univ of Illinois at Urbana-Champaign

HDF

40
Groups
• A mechanism for collections of
related objects
• Every file starts with a
root group
• Can have attributes
• Like directories
in Unix, but a graph,
rather than a tree

“root”

NCSA/Univ of Illinois at Urbana-Champaign

HDF

41
Groups
Groups and members of groups can be shared
root

NCSA/Univ of Illinois at Urbana-Champaign

HDF

42
Mounting
File A

File B

root

root

mount!
NCSA/Univ of Illinois at Urbana-Champaign

HDF

43
Reading & writing with HDF5

• Set properties
• Describe the data
– datatypes
– rank and dimensions
– mapping between file and memory

• Read/write
NCSA/Univ of Illinois at Urbana-Champaign

HDF

44
Files needn’t be files - Virtual File Layer
VFL: A public API for writing I/O drivers
Hid_t

“File” Handle
VFL: Virtual File I/O Layer
stdio

mpio

memory

network I/O drivers

“Storage”

Files

HDF

Memory

Network

NCSA/Univ of Illinois at Urbana-Champaign

45
HDF5 tools
• Current
– hdf5ls - lists contents of HDF5 file
– h5dumper - higher level view
– hdf5 hdf4 converter

• Future
–
–
–
–
–
HDF

Convert HDF5 ↔ ascii, binary, GIFF, etc
Convert HDF4 HDF5
Java tools - VisAD, etc.
File/code generation from DDL description
Talking to vendors
NCSA/Univ of Illinois at Urbana-Champaign
46
Other HDF5 activities

•
•
•
•

Performance tuning
Object model
Fortran and C++ API
Thread-safe HDF5

NCSA/Univ of Illinois at Urbana-Champaign

HDF

47
IV. HDF4 vs. HDF5

NCSA/Univ of Illinois at Urbana-Champaign

HDF

48
HDF4 vs. HDF5
• HDF4

• HDF5 - successor to HDF4

– Original format and library
– Compatible with all earlier
versions
– 6 primary objects
•
•
•
•
•

multidim array of scalars
raster image, palette
table
annotation
group

– Biggest current user: Earth
Observing System Data and
Info System (EOSDIS)

– New format and library
– Not compatible with earlier
versions
– 2 primary objects
• multidim. array of records
• group

– Biggest current user: Accelerated
Strategic Computing Initiative
(ASCI)
NCSA/Univ of Illinois at Urbana-Champaign

HDF

49
HDF4 object types can be derived from
HDF5 datasets and groups

HDF5 group

HDF5 dataset

HDF4 Vgroup
lat
12
15
17
23
25

lon
23
24
21
35
31

temp
3.1
4.2
3.6
7.2
6.3

HDF4 Vdata
1-dim array
of records

HDF

HDF4 SDS
n-dim array
of scalars

2-dim array of
multi-component
scalars

HDF4
8-bit raster

March 15, 1990.
Simulation with k=10.0,
beta=1.22e3. Calculate
the magnitude ...

03
-3
45
45

04
72
77
67

43
44
34
87

43
50
23
00

43
34
57
45

HDF4 NCSA/Univ of Illinois at Urbana-Champaign
24-bit raster

50
Status of HDF4 vs. HDF5
• HDF4 is still an EOS standard
• HDF5 likely also
• HDF4 maintenance
– Maintained as long as EOS needs it
– Minimal new feature

• New applications: use HDF5 if possible!
– New features, performance improvements, etc.
NCSA/Univ of Illinois at Urbana-Champaign

HDF

51
HDF Information
• HDF Information Center
– http://hdf.ncsa.uiuc.edu/

• HDF Help email address
– hdfhelp@ncsa.uiuc.edu

• HDF users mailing list
– hdfnews@ncsa.uiuc.edu

NCSA/Univ of Illinois at Urbana-Champaign

HDF

52

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
 
Virtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFVirtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDF
 
Sap technical deep dive in a column oriented in memory database
Sap technical deep dive in a column oriented in memory databaseSap technical deep dive in a column oriented in memory database
Sap technical deep dive in a column oriented in memory database
 
Parallel HDF5 Developments
Parallel HDF5 DevelopmentsParallel HDF5 Developments
Parallel HDF5 Developments
 
Welcome to HDF Workshop V
Welcome to HDF Workshop VWelcome to HDF Workshop V
Welcome to HDF Workshop V
 
Performance Tuning in HDF5
Performance Tuning in HDF5 Performance Tuning in HDF5
Performance Tuning in HDF5
 
Digital Object Identifiers for EOSDIS data
Digital Object Identifiers for EOSDIS dataDigital Object Identifiers for EOSDIS data
Digital Object Identifiers for EOSDIS data
 
ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13
 
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Status of HDF-EOS, Related Software and Tools
Status of HDF-EOS, Related Software and ToolsStatus of HDF-EOS, Related Software and Tools
Status of HDF-EOS, Related Software and Tools
 
HDF4 and HDF5 Performance Preliminary Results
HDF4 and HDF5 Performance Preliminary ResultsHDF4 and HDF5 Performance Preliminary Results
HDF4 and HDF5 Performance Preliminary Results
 
Caching and Buffering in HDF5
Caching and Buffering in HDF5Caching and Buffering in HDF5
Caching and Buffering in HDF5
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
4Science presents: DSpace-CRIS main features
4Science presents: DSpace-CRIS main features4Science presents: DSpace-CRIS main features
4Science presents: DSpace-CRIS main features
 
Introduction to NetCDF-4
Introduction to NetCDF-4Introduction to NetCDF-4
Introduction to NetCDF-4
 
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, DatatypesHDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
 
iRODS: Interoperability in Data Management
iRODS: Interoperability in Data ManagementiRODS: Interoperability in Data Management
iRODS: Interoperability in Data Management
 
160606 data lifecycle project outline
160606 data lifecycle project outline160606 data lifecycle project outline
160606 data lifecycle project outline
 
Hota hadoop
Hota hadoopHota hadoop
Hota hadoop
 

Similar a HDF

Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
inside-BigData.com
 
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsSciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
Qian Lin
 
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
DataWorks Summit
 

Similar a HDF (20)

HDF Update
HDF UpdateHDF Update
HDF Update
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Building and Extensible Storage Ecosystem with WOS
Building and Extensible Storage Ecosystem with WOSBuilding and Extensible Storage Ecosystem with WOS
Building and Extensible Storage Ecosystem with WOS
 
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciences
 
data analytics lecture3.ppt
data analytics lecture3.pptdata analytics lecture3.ppt
data analytics lecture3.ppt
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and Education
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steven McEachern - ADA, DDI (metadata standard) and the Data LifecycleSteven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
 
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsSciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
 
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
 
Applying Repository Systems to Audiovisual Preservation
Applying Repository Systems to Audiovisual PreservationApplying Repository Systems to Audiovisual Preservation
Applying Repository Systems to Audiovisual Preservation
 

Más de The HDF-EOS Tools and Information Center

Más de The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 

HDF

  • 1. HDF HDF/HDF-EOS Workshop III Sept. 14-16, 1999 Mike Folk, HDF Group http://hdf.ncsa.uiuc.edu/ National Center for Supercomputing Applications University of Illinois at Urbana-Champaign NCSA/Univ of Illinois at Urbana-Champaign HDF 1
  • 2. Topics I. Overview II. NCSA HDF Activities III. HDF5 IV. HDF4 vs. HDF5 NCSA/Univ of Illinois at Urbana-Champaign HDF 2
  • 3. I. HDF Overview NCSA/Univ of Illinois at Urbana-Champaign HDF 3
  • 4. HDF Mission To develop, promote, deploy, and support open and free technologies that facilitate scientific data storage, exchange, access, analysis and discovery. NCSA/Univ of Illinois at Urbana-Champaign HDF 4
  • 5. What is HDF? • Scientific data file format & supporting software • For images, arrays, tables, other structures • Features – Portability across architectures • I/O library • Files – Efficient I/O – Efficient storage HDF NCSA/Univ of Illinois at Urbana-Champaign 5
  • 6. Why use HDF? • • • • • • Manage data Share data Use software that understands HDF Improve I/O performance Improve storage efficiency Use an open standard NCSA/Univ of Illinois at Urbana-Champaign HDF 6
  • 7. An HDF File: A Collection of Scientific Data Objects HDF file containing four 3-D arrays NCSA/Univ of Illinois at Urbana-Champaign HDF 7
  • 8. Mixing HDF Objects in One File 3-D array group Raster image palette HDF file 3-D array Raster image Lat lon temp ---- ---- ----12 23 3.1 15 24 4.2 17 21 3.6 16 35 5.7 Table NCSA/Univ of Illinois at Urbana-Champaign HDF 8
  • 9. HDF Software Utilities and applications for manipulating, viewing, and analyzing data. General Applications Application Programming Interfaces Low-level Interface HDF file } HDF I/O library – High-level, object-specific APIs. – Low-level API for I/O to files, etc. File or other data source. NCSA/Univ of Illinois at Urbana-Champaign HDF 9
  • 10. HDF Applications Software • Free software – NCSA HDF library and utilities – Other software • Commercial/other software that “understands” – all of HDF (Noesys, IDL, HDF Explorer) – certain HDF objects (MATLAB, WebWinds) – certain HDF applications (SHARP, WIM) • http://hdf.ncsa.uiuc.edu/tools.html NCSA/Univ of Illinois at Urbana-Champaign HDF 10
  • 11. What platforms does HDF run on? • Sun: Solaris • SGI: Indy, Power Challenge, Origin, Cray C90, YMP, T3E • HP9000, HP-Convex Exemplar • IBM: RS6000, SP2 • DEC: Alpha/Digital UNIX, OpenVMS VAX: OpenVMS • Intel: Solarisx86, Linux, FreeBSD, Windows NT/98 • PowerPC: Mac-OS University NCSA/Univ of Illinois at Urbana-Champaign HDF 11
  • 12. A Sampling of HDF Users NCSA-affiliated Science teams Visualization, data exch, fast I/O, ... Mathworks, Fortner Software, Research Systems Inc., etc. Format supported by vendors of vis and data analysis software Boeing Space-time change detection in images Distributed Oceanographic Data System (DODS) Remote access to earth science data Army Research Lab Network distributed global memory Center for Analysis & Prediction of Storms Fast parallel I/O, portability, multi-resolution grids TRAPPIST (Euro consortium) Exchange, analysis & visualization of non-destructive testing data NCSA/Univ of Illinois at Urbana-Champaign HDF 12
  • 13. Major User #1: EOSDIS • ESDIS Project – open standard exchange format and I/O library for EOSDIS – EOS applications • HDF requirements – – – – – Earth science data types (HDF-EOS, etc,) User support for scientists, data producers, etc. Library and file structure improvements HDF tools, utilities, access software Software maintenance and QA NCSA/Univ of Illinois at Urbana-Champaign HDF 13
  • 14. Major User #2: ASCI • ASCI Data Models and Formats (DMF) Group – open standard exchange format and I/O library for ASCI – DOE tri-lab ASCI applications • HDF requirements – – – – large datasets (> a terabyte) ASCI data types, especially meshes good performance in massive parallel environments primarily HDF 5 NCSA/Univ of Illinois at Urbana-Champaign HDF 14
  • 15. II. NCSA HDF Activities NCSA/Univ of Illinois at Urbana-Champaign HDF 15
  • 16. Java applications • HDF APIs – Basis for tools that access HDF • HDF Viewers – HDF browser/visualizer • HDF4 Data Server Prototype – Lessons learned about remote access to NCSA/Univ of Illinois at Urbana-Champaign HDF 16
  • 17. Remote Data Access • The SDB: Web-based Server-side Data Browser • Java for remote access • WP-ESIP: DODS project • Computational Grids (Globus/GASS) NCSA/Univ of Illinois at Urbana-Champaign HDF 17
  • 18. HDF Standardization • To share files, users must organize them similarly. • HDF user groups create standard profiles – Ways to organize data in HDF files. – Metadata – API • Examples: HDF-EOS, ASCI DMF NCSA/Univ of Illinois at Urbana-Champaign HDF 18
  • 19. HDF-EOS software layers HDF-EOS Applications HDF-EOS profiles General Applications HDF-EOS API Application Programming Interfaces Low-level Interface HDF file NCSA/Univ of Illinois at Urbana-Champaign HDF 19
  • 20. “HDF Configuration Record” (HCR) • To simplify the tasks of defining, comparing, and producing HDF-EOS files • Formal (ODL) descriptions of HDF-EOS objects NCSA/Univ of Illinois at Urbana-Champaign HDF 20
  • 21. HCR of Swath /* Project XYZ */ /* First version defined on June 10th, 1998 */ OBJECT = SWATH NAME = SCAN1 OBJECT = Dimension NAME = GeoTrack Size = 1200 END_OBJECT = Dimension OBJECT = Dimension NAME = GeoCrossTrack Size = 205 END_OBJECT = Dimension OBJECT = Dimension NAME = DataX Size = 2410 END_OBJECT = Dimension END_OBJECT = SWATH END NCSA/Univ of Illinois at Urbana-Champaign HDF 21
  • 22. HCR • HCR Utilities: – Converters: HCR ↔ HDF-EOS – Edit HCR and HDF-EOS – Compare HCR with HDF-EOS file • Current projects: – Extend HCR converters to all of HDF4 – Similar work with HDF5 – XML too NCSA/Univ of Illinois at Urbana-Champaign HDF 22
  • 23. III. HDF5 NCSA/Univ of Illinois at Urbana-Champaign HDF 23
  • 24. Why HDF5? • HDF shortcomings exposed by EOSDIS, ASCI and others... – – – – – Limits on object & file size (<2GB) Limited number of of objects (<20K) Rigid data models I/O performance Aging software infrastructure (code entropy) NCSA/Univ of Illinois at Urbana-Champaign HDF 24
  • 25. • …new Demands... – Bigger, faster machines and storage systems • massive parallelism, parallel file systems • teraflop speeds, terabyte storage – Greater complexity • complex data structures • complex subsetting – More emphasis on remote & distributed access NCSA/Univ of Illinois at Urbana-Champaign HDF 25
  • 26. • … and ASCI Requirements – – – – Compatibility with vector bundle model Compatibility with MPI-IO Ability to transform data between memory & storage Parallel file systems: PIOFS, HPSS, etc. NCSA/Univ of Illinois at Urbana-Champaign HDF 26
  • 27. New HDF5 Features • More scalable – Larger arrays and files – More objects • Improved data model – New datatypes – Single comprehensive dataset object • Improved software – More flexible, robust library – More flexible API – More I/O options NCSA/Univ of Illinois at Urbana-Champaign HDF 27
  • 28. HDF5 data model • Two primary objects • Dataset – multidimensional array of elements – rich variety of datatypes • group – directory-like structure – contains datasets, groups, other objects NCSA/Univ of Illinois at Urbana-Champaign HDF 28
  • 29. Dataset components • multidimensional array • header with metadata – – – – datatype dataspace attributes storage properties NCSA/Univ of Illinois at Urbana-Champaign HDF 29
  • 30. Simple datatypes • • • • • • The usual scalars: integer & float user-defined scalars (e.g. 13-bit integers) variable length (e.g. strings) pointers to objects or regions of datasets enumeration opaque NCSA/Univ of Illinois at Urbana-Champaign HDF 30
  • 31. Compound datatypes • • • • User-defined Comparable to C structs Members can be simple or compound types Members can be multidimensional NCSA/Univ of Illinois at Urbana-Champaign HDF 31
  • 32. Data Spaces • How data are organized to form a dataset – rank – dimensions • Subsetting during I/O operations – What subset of data is to be moved – In-memory organization of data – In-file organization of data NCSA/Univ of Illinois at Urbana-Champaign HDF 32
  • 33. HDF5 dataset: array of records int8 int4 int16 Datatype: float32 Dimensionality: 5 x 3 Record 3 5 NCSA/Univ of Illinois at Urbana-Champaign HDF 33
  • 34. Dataspaces Reading Dataset into Memory from File File Memory 2D array of integers 3D array of floats Read NCSA/Univ of Illinois at Urbana-Champaign HDF 34
  • 35. Selection: Examples of mappings between file selections and memory selections. (a) A hyperslab from a 2D array to the corner of a smaller 2D array (c) A sequence of points from a 2D array to a sequence of points in a 3D array. (b) A regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array (d) Union of slabs in file to union of slabs in memory. No. of elements must be equal. NCSA/Univ of Illinois at Urbana-Champaign HDF 35
  • 36. Attributes • Named pieces of data • Stored in a dataset or group header • Operations are scaled-down versions of the dataset operations – Not extendible – No compression – No partial I/O NCSA/Univ of Illinois at Urbana-Champaign HDF 36
  • 37. Property list • Properties of objects or operations • Describe how to create, store, access and transfer data NCSA/Univ of Illinois at Urbana-Champaign HDF 37
  • 38. Some Properties • chunked Better subsetting access time; extendable • compressed Improves storage efficiency, transmission speed • extendable Datasets can be extended in any direction • split file Dataset “Fred” File A HDF Metadata for Fred File B Metadata in one file, raw data in another. Data for Fred NCSA/Univ of Illinois at Urbana-Champaign 38
  • 39. Dataset components Dataset Metadata Data Attributes time = 32.4 pressure = 987 temp = 56 Dataspace Datatype Dim_3=2 Rank=2 Dim_2=4 Dim_1=5 int16 Storage properties Chunked; compressed NCSA/Univ of Illinois at Urbana-Champaign HDF 39
  • 40. Groups • • • • • Structures for organizing the file Like Vgroups in HDF4 Like directories in hierarchical file system Every file starts with a root group Groups have attributes NCSA/Univ of Illinois at Urbana-Champaign HDF 40
  • 41. Groups • A mechanism for collections of related objects • Every file starts with a root group • Can have attributes • Like directories in Unix, but a graph, rather than a tree “root” NCSA/Univ of Illinois at Urbana-Champaign HDF 41
  • 42. Groups Groups and members of groups can be shared root NCSA/Univ of Illinois at Urbana-Champaign HDF 42
  • 43. Mounting File A File B root root mount! NCSA/Univ of Illinois at Urbana-Champaign HDF 43
  • 44. Reading & writing with HDF5 • Set properties • Describe the data – datatypes – rank and dimensions – mapping between file and memory • Read/write NCSA/Univ of Illinois at Urbana-Champaign HDF 44
  • 45. Files needn’t be files - Virtual File Layer VFL: A public API for writing I/O drivers Hid_t “File” Handle VFL: Virtual File I/O Layer stdio mpio memory network I/O drivers “Storage” Files HDF Memory Network NCSA/Univ of Illinois at Urbana-Champaign 45
  • 46. HDF5 tools • Current – hdf5ls - lists contents of HDF5 file – h5dumper - higher level view – hdf5 hdf4 converter • Future – – – – – HDF Convert HDF5 ↔ ascii, binary, GIFF, etc Convert HDF4 HDF5 Java tools - VisAD, etc. File/code generation from DDL description Talking to vendors NCSA/Univ of Illinois at Urbana-Champaign 46
  • 47. Other HDF5 activities • • • • Performance tuning Object model Fortran and C++ API Thread-safe HDF5 NCSA/Univ of Illinois at Urbana-Champaign HDF 47
  • 48. IV. HDF4 vs. HDF5 NCSA/Univ of Illinois at Urbana-Champaign HDF 48
  • 49. HDF4 vs. HDF5 • HDF4 • HDF5 - successor to HDF4 – Original format and library – Compatible with all earlier versions – 6 primary objects • • • • • multidim array of scalars raster image, palette table annotation group – Biggest current user: Earth Observing System Data and Info System (EOSDIS) – New format and library – Not compatible with earlier versions – 2 primary objects • multidim. array of records • group – Biggest current user: Accelerated Strategic Computing Initiative (ASCI) NCSA/Univ of Illinois at Urbana-Champaign HDF 49
  • 50. HDF4 object types can be derived from HDF5 datasets and groups HDF5 group HDF5 dataset HDF4 Vgroup lat 12 15 17 23 25 lon 23 24 21 35 31 temp 3.1 4.2 3.6 7.2 6.3 HDF4 Vdata 1-dim array of records HDF HDF4 SDS n-dim array of scalars 2-dim array of multi-component scalars HDF4 8-bit raster March 15, 1990. Simulation with k=10.0, beta=1.22e3. Calculate the magnitude ... 03 -3 45 45 04 72 77 67 43 44 34 87 43 50 23 00 43 34 57 45 HDF4 NCSA/Univ of Illinois at Urbana-Champaign 24-bit raster 50
  • 51. Status of HDF4 vs. HDF5 • HDF4 is still an EOS standard • HDF5 likely also • HDF4 maintenance – Maintained as long as EOS needs it – Minimal new feature • New applications: use HDF5 if possible! – New features, performance improvements, etc. NCSA/Univ of Illinois at Urbana-Champaign HDF 51
  • 52. HDF Information • HDF Information Center – http://hdf.ncsa.uiuc.edu/ • HDF Help email address – hdfhelp@ncsa.uiuc.edu • HDF users mailing list – hdfnews@ncsa.uiuc.edu NCSA/Univ of Illinois at Urbana-Champaign HDF 52

Notas del editor

  1. ASCI’s DMF Group is currently supporting HDF work with the idea of possibly adopting HDF as a standard. They want to share data and software among the three labs (Livermore, Sandia, Los Alamos), and would prefer a “non-invented-here,” open standard with publicly available software.. HDF Requirements. ASCI’s needs overlap with those of EOSDIS, but with some important differences: ASCI deals largely with simulations on massively parallel machines, and hence requires very high performance in doing I/O. Only a parallel version of the library will satisfy ASCI’s needs. ASCI data deals with meshes, whereas EOS deals largely with remotely sensed data. Many types of meshes can be much more complex than remotely sensed data is, and typically require indexed access. Because of these requirement, the current official version of HDF (HDF4) is not adequate for the ASCI project. Fortunately, with support from NASA, we have been developing a completely new version of HDF designed to address these kinds of requirements. This is HDF5. More about HDF5 later. A mesh data repository is being developed by the group to standardize the data models and terminology used by the three labs. This will allow them to share resources much better than is currently the case. There is also the hope that the mesh standard adopted by ASCI will be adopted by others, further expanding the leverage of the standard.
  2. The HDF group has several Java-based projects. Java’s platform independence supports the need to be able to work with HDF on many platforms. Java’s graphical interface features support the creation of platform-independent HDF browsing and visualization software. And Java’s network awareness facilitate the development of software for remote access to HDF data. A Java HDF Interface (JHI). JHI provides an interface to essentially all the functions of the NCSA HDF 4.1r2 library. The JHI is analogous to the FORTRAN interface already provided as part of the HDF library release. Basis for tools that access HDF. Any Java application can use the interface classes to read and write HDF. This package ``wraps&amp;apos;&amp;apos; the standard HDF 4.1r2 library, which is called from Java through `native&amp;apos; methods. A Java HDF Viewer. This is a tool to provide basic viewing capabilities for HDF. HDF browser/visualizer. With this tool you can open an HDF file, look at images, arrays, tables and attributes, and do some simple visualization. Template for other Java viz apps. It isn’t meant as an all-encompassing visualization tool for HDF. That is left to others, including commercial vendors. Rather as a template for people to use to build more sophisticated tools. Java Scientific Data Server Prototype. We experimenting with remote access to HDF. This project examines different ways Java can be used to provide remote access to HDF Lessons learned about scientific data servers. We are learning a great deal about Java’s remote access capabilities: servlets, RMI, etc. Template for other Java server apps. Again, we hope this technology will help others who what to do similar things, or to build products out of our prototypes.
  3. HDF5 is a new, experimental version of HDF that addresses limitations of the current version (HDF4) and addresses requirements of modern systems and applications. HDF5 is a complete new format and I/O library, not an incrementally new version of HDF4.. An HDF5 prototype was released in Feb, 1998. Although incomplete, this library shows the basic features of HDF5. A full release is scheduled for Summer 1998. Why HDF5? HDF5 is motivated by severe limitations in the HDF4 format and library. HDF5 retains most features of HDF4, but addresses these limitations, including: Large array and files support. A single HDF4 file cannot store more than 20,000 complex objects, and a cannot be larger than 2 GB. HDF5 will be able to store virtually any number of objects of virtually any size. Simple, comprehensive data model. HDF4 has more object types than necessary, and datatypes are too restricted. HDF5 uses a simpler, more comprehensive data model that includes only two basic structures: a multidimensional array of record structures, and a grouping structure. All HDF4 structures can be derived from these. New library, with emphasis on parallel I/O. The HDF4 library is old, overly complex, does not support parallel access well, and is not thread tolerant. HDF5 provides a better-engineered library and API, with improved support for parallel I/O, threads, and other requirements imposed by modern systems and applications. Collaborations. HDF5 was motivated by the needs of many different users, but two projects in particular are driving HDF5 development: ASCI: mesh data standard for ASCI physics. The DMF’s ASCI mesh standard initiative described in an earlier slide is providing most of the support for HDF5. Digital Library Initiative (DLI): integrate with commercial object store. A DLI project at the U. of Illinois is using HDF5 for data access in combination with a commercial object store. This project requires very efficient parallel I/O.
  4. Here is an example of a basic HDF5 object. Notice that each element in the 3D array is a record with four values in it.
  5. Like HDF4, HDF5 has a grouping structure. The main difference is that every HDF5 file starts with a root group, whereas HDF4 doesn’t need any groups at all.
  6. Like HDF4, HDF5 has a grouping structure. The main difference is that every HDF5 file starts with a root group, whereas HDF4 doesn’t need any groups at all.
  7. Like HDF4, HDF5 has a grouping structure. The main difference is that every HDF5 file starts with a root group, whereas HDF4 doesn’t need any groups at all.