SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
SUBSETTING
Matt Smith
Information Technology and Systems Center (ITSC)
University of Alabama in Huntsville (UAH)
http://subset.itsc.uah.edu

UAH

The University of Alabama in
Huntsville
Subsetting
• Goal: to provide a science data user with only the data
they request as quickly as possible.
• Benefits science data users and data centers:
- reduces analysis time by reducing amount of data
- reduces time for data delivery
- reduces resources (network, personnel, media, etc.)
• Steps:
- locate spatial / temporal / spectral area of interest
- extract
- re-assemble for distribution

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
HEW
• HDF-EOS Web-based Subsetter
• Prototype software designed to be dataset-independent
(HDF-EOS)
• Front-end/GUI
• Uses HTML forms and JavaScript
• Optional

• Back-end
• Needs subset criteria file and HDF-EOS data
• Performs subsetting as a “batch” job

• http://subset.itsc.uah.edu/hew2k

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Subset Criteria File
•
•
•
•

File(s) to subset
Parameters/channels
E-mail address
Bounding box

Req’d
Req’d
Req’d
Opt.

• Latitude/Longitude bounds
• Row/Column bounds (grids only)

•
•
•
•
•
•

Time range
Subsampling stride
Non-geolocated objects (also_include)
Output file prefix
.met file
Sub-scan subsetting (swaths only)

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

Opt.
Req’d
Opt.
Opt.
Opt.
Opt.

UAH

The University of Alabama in
Huntsville
Example Subset Criteria File
GROUP = SUBSET
PARENT_FILE =(“/AQUA/AMSR/AE_L2A.hdfeos”)
LATITUDE_RANGE = (35.000000, 40.000000)
LONGITUDE_RANGE = (-77.000000, -72.000000)
EMAIL = “user@company.com”
OUTPUT_PREFIX = “NC_coast”
MET_FILE = “YES”
GROUP = SPOG
NAME = “swath_1”
TYPE = “SWATH”
PARAMETERS = “89.0V_Res.1_TB”,
“89.0V_Res.2_TB”)
SUBSAMPLING = (“GeoTrack”, 1,
“GeoXtrack”, 1)
END_GROUP = SPOG
END_GROUP = SUBSET
END

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
HEW Back-end
•
•
•
•
•
•

Uses HDF-EOS (and HDF) library
Instructions via a subset criteria file (ODL)
Handles multiple similar files
Handles Swath and/or Grid objects
Unix (SGI & Sun) executables available
Subsetted output files contain:
•
•
•
•

StructMetadata (HDF-EOS)
ArchiveMetadata*
ProductMetadata (added by HEW ODL file)
CoreMetadata* (w/ modified bounding box & time info)
• optionally placed in . m e t file
•

27-28 February 2002

* if present in parent file
Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
HEW Subsettable data
EOS DATASETS
• Terra
MODIS
MOPITT
ASTER
• Aqua
AMSR-E

27-28 February 2002

OTHERS
TRMM
TMI
NOAA-15
AMSU-A
Any other HDF-EOS2 (HDF4) data
written with HDF-EOS library
subsetting calls in mind

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
HEW integration with ECS
EDG System

2

EDG
Order
submission
(HTML)

ECS

ECS

1

End
user

7
3

Output data
(Reingested)

4
Data order
and reply

Subset ODL
and reply

6
Output
data

Subsetter

5
Input
data

Subsetting System

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
ECS integration plans
•
•
•
•
•

UAH/ITSC-written interface software
6a.05 to be released in March
NSIDC, GDAAC, EDC
EDG v3.4 will have subsetting options
Enhancements for DAACs

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Subsetting web-site
• http://www.subset.org
• Hope to create “portal”
• for everyone involved in subsetting
•
•
•
•
•
•
•

27-28 February 2002

Advertising
Forums
Data
Software
Glossary
Tutorials
Links to specialized subsetters

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Other HDF-EOS Tools
•
•
•
•

SPOT – Subsettability Checker
eospeek – HDF-EOS file display
hdfpeek – HDF file display
HDF-EOS user’s manual (in work)

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Subsetting Plans
•

•
•
•
•

Complete ECS Integration
• Maintain/Improve as needed for DAACs
• Front-end polar projection coverage map (NSIDC)
Certify software with new datasets (Aqua, Aura,…)
Incorporate ESML usage
Provide support for HDF-EOS5
Provide additional specialized subsetting applications
for instrument teams and others

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Earth Science Markup Language
“Define Once, Use Anywhere”

Information Technology and Systems Center
University of Alabama in Huntsville

http://esml.itsc.uah.edu

Contact Info:
Rahul Ramachandran
rramachandran@itsc.uah.edu
Research Effort Supported by:
Karen Moe
Earth Science Technology Office, NASA

UAH

The University of Alabama in
Huntsville
Data Characteristics
•

Different Data Formats
• BUFR (DoD, WMO)
• CDF, NetCDF
• GRIB (WMO)
• HDF, HDF-EOS (NASA)
• Free formats (Binary, ASCII)
• GRaDs, McIDAS, Pheonix, URF etc etc

•

Different states of processing
• raw, calibrated, derived, modeled or interpreted

•

Data/application interoperability problem

•

Most scientists are not programmers

•

Writing data decoders takes time and effort!

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Data/Application Interoperability Problem
DATA
DATA
FORMAT 11
FORMAT

DATA
DATA
FORMAT 22
FORMAT

DATA
DATA
FORMAT 33
FORMAT

FORMAT
CONVERTER

READER 1

READER 2
APPLICATION

•

Specialized code for every format
• Difficult to assimilate new data types

•

Enforce a Standard Data Format
•

Not practical for legacy datasets

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
What is ESML?
•

Specialized markup language for Earth Science metadata based on
XML

•

Machine-readable and -interpretable representation of the structure
and content of any data file, regardless of data format

•

ESML is NOT a new data format

•

External metadata files that can be generated by either data producer
or data consumer (at collection, data set, and/or granule level)

•

ESML will provide the benefits of a standard, self-describing data
format (like HDF, HDF-EOS, netCDF, geoTIFF, …) without the cost
of data conversion

•

ESML consists of three types of metadata:
• Syntactic
• Semantic
• Content

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Three types of metadata
• Syntactic
– Structural information, bits, word-length, endianness,
sequence

• Semantic
– Meaning of the data, units, frame of reference

• Content
– “Typical” metadata, general information
• Producer contact info, version

– Searchable information, keywords, etc.

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
WMO Sounding data
4
4
5
5
6
4
6
6
6
6
4
5
6
6
5
6
5
5
6
6
4
6
5
6
4

10000
9250
8890
8830
8765
8500
8136
7840
7554
7279
7000
6910
6753
6499
6130
6017
5840
5630
5563
5348
5000
4940
4890
4740
4000

27-28 February 2002

94 99999 99999 99999 99999
757 99999 99999 99999 99999
1102 142 -28 99999 99999
1159 150 -20 99999 99999
1219 99999 99999 310 26
1468 130 -40 330 36
1828 99999 99999 320 46
2133 99999 99999 315 62
2438 99999 99999 310 67
2743 99999 99999 295 62
3065 16 -124 275 67
3169 20 -170 99999 99999
3352 99999 99999 270 93
3657 99999 99999 270 118
4121 -39 -199 99999 99999
4267 99999 99999 255 93
4500 -73 -193 99999 99999
4784 -87 -297 99999 99999
4876 99999 99999 240 87
5181 99999 99999 230 103
5700 -161 -241 235 139
5791 99999 99999 235 139
5867 -173 -243 99999 99999
6096 99999 99999 235 129
7340 -287 -367 245 108
Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Ex: ESML for WMO Sounding Data
<a:ESML xmlns:a="ESML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="ESML R:SchemaESML.xsd">
<SyntacticMetaData>
<Ascii>
<AsciiStructure name="DataTable" geoInfo="NoGeoInfo" instances="1">
<Array occurs=“*">
<Field format="%d" name="number">
<Attribute/>
</Field>
<Field format="%d" name="PRESSURE">
<Data unit="mb" equation="X/10“ FillValue=“99999”/>
</Field>
<Field format="%d" name="HEIGHT">
<Data unit=“m”/>
</Field>
<Field format="%d" name="TEMPERATURE">
<Data unit=“K” equation="X/10/>
</Field>
<Field format="%d" name="DEWPOINT">
<Data unit=“K” equation="X/10/>
</Field>
<Field format="%d" name="WIND DIRECTION">
<Data unit=“DDD”/>
</Field>
<Field format="%d" name="WIND SPEED">
<Data unit=“m/s”/>
</Field>
</Array>
</AsciiStructure>
</Ascii>
</SyntacticMetaData>
</a:ESML>

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
ESML Schema/Library
• ESML Schema Version 0.5
• Supports ASCII, Binary and HDF-EOS
• W3C compliant schema
• Extensible allowing new formats or modifications
• ESML Library Version 0.5
• C++ version
• Windows 95/98/NT/2000 version
• Porting to LINUX version
• Changed from Oracle XML parser to Apache XML parser

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Future Plans
• Addition of new data formats to both Schema and Library
• GRIB
• McIDAS
• BUFR, HDF4/5 and others
• Versions of the Library
• C/C++ version – UNIX
• C/C++ version – Mac
• Java version

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
ESML Data Browser
•

Hybrid product (Java Client/C++ Library backend/JNI Interface)

•

Prototype version is the extension of the ESML Demo Tool

•

Features
• Browse and view data values using an ESML file
• Browse the metadata for each data field

•

Future Features:
• Allow format conversion with automatic generation of ESML
metadata file
• Allow selection of multiple fields
• Additional functionality such as Subsetting
• Browse data images

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
ESML Editor
• 100% Java version prototype
• Unable to find a COTS editor
• Utilizes Expert System principles to give users correct
options
• Hides XML tags from the users
• Future Features:
• Allow text editing of the XML tags also
• Incorporate feedback from users

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
ESML Web Page
• URL: esml.itsc.uah.edu
• Post latest products, news, presentations, papers
• Schema and related documents available to all
• Beta version of the Library available on limited basis
• Beta versions of ESML Editor and ESML Data Browser
will be made available soon

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Other UAH/ITSC work
•
•
•
•

AMSR-E
Passive Microwave (PM)-ESIP
ADaM – Algorithm Development and Mining
EVE – An EnVironmEnt for On-board
Processing

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Contact UAH/ITSC
HEW (HDF-EOS Web-based Subsetter)
http://subset.itsc.uah.edu/hew2k
ESML (Earth Science Markup Language)
http://esml.itsc.uah.edu
General Purpose Subsetting (ADaM)
http://datamining.itsc.uah.edu
On-Demand Subsetting (Passive Microwave – ESIP)
http://pm-esip.msfc.nasa.gov
SSM/I Coarse-grain Subsetting
http://ghrc.msfc.nasa.gov/ssmi/ssmi_subset.html

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville

Más contenido relacionado

Destacado (7)

HDF-EOS Aura File Format Guidelines
HDF-EOS Aura File Format GuidelinesHDF-EOS Aura File Format Guidelines
HDF-EOS Aura File Format Guidelines
 
The LEISA Atmospheric Corrector (LAC) on Earth Observer 1 (EO1)
The LEISA Atmospheric Corrector (LAC) on Earth Observer 1 (EO1)The LEISA Atmospheric Corrector (LAC) on Earth Observer 1 (EO1)
The LEISA Atmospheric Corrector (LAC) on Earth Observer 1 (EO1)
 
HDF and HDF-EOS Experiences and Applications
HDF and HDF-EOS Experiences and ApplicationsHDF and HDF-EOS Experiences and Applications
HDF and HDF-EOS Experiences and Applications
 
Transitioning from HDF4 to HDF5
Transitioning from HDF4 to HDF5Transitioning from HDF4 to HDF5
Transitioning from HDF4 to HDF5
 
Welcome to HDF Workshop V
Welcome to HDF Workshop VWelcome to HDF Workshop V
Welcome to HDF Workshop V
 
Workshop Discussion: HDF & HDF-EOS Future Direction
Workshop Discussion: HDF & HDF-EOS Future DirectionWorkshop Discussion: HDF & HDF-EOS Future Direction
Workshop Discussion: HDF & HDF-EOS Future Direction
 
HDF Update
HDF UpdateHDF Update
HDF Update
 

Similar a Subsetting

2004-11-13 Supersite Relational Database Project: (Data Portal?)
2004-11-13 Supersite Relational Database Project: (Data Portal?)2004-11-13 Supersite Relational Database Project: (Data Portal?)
2004-11-13 Supersite Relational Database Project: (Data Portal?)
Rudolf Husar
 
Implementation of OGC Web Coverage Service Using HDF5/HDF-EOS5 as the Base Fi...
Implementation of OGC Web Coverage Service Using HDF5/HDF-EOS5 as the Base Fi...Implementation of OGC Web Coverage Service Using HDF5/HDF-EOS5 as the Base Fi...
Implementation of OGC Web Coverage Service Using HDF5/HDF-EOS5 as the Base Fi...
The HDF-EOS Tools and Information Center
 

Similar a Subsetting (20)

Earth Science Markup Language (ESML) - A Tutorial
Earth Science Markup Language (ESML) - A TutorialEarth Science Markup Language (ESML) - A Tutorial
Earth Science Markup Language (ESML) - A Tutorial
 
Data Mobility Exhibition
Data Mobility ExhibitionData Mobility Exhibition
Data Mobility Exhibition
 
Subsetting at UAH
Subsetting at UAHSubsetting at UAH
Subsetting at UAH
 
Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)
 
Dataset Independent Subsetting
Dataset Independent SubsettingDataset Independent Subsetting
Dataset Independent Subsetting
 
Metadata Requirements for EOSDIS Data Providers
Metadata Requirements for EOSDIS Data ProvidersMetadata Requirements for EOSDIS Data Providers
Metadata Requirements for EOSDIS Data Providers
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data Sharing
 
Scientific
Scientific Scientific
Scientific
 
Memory-Driven Near-Data Acceleration and its application to DOME/SKA
 Memory-Driven Near-Data Acceleration and its application to DOME/SKA Memory-Driven Near-Data Acceleration and its application to DOME/SKA
Memory-Driven Near-Data Acceleration and its application to DOME/SKA
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
 
2004-11-13 Supersite Relational Database Project: (Data Portal?)
2004-11-13 Supersite Relational Database Project: (Data Portal?)2004-11-13 Supersite Relational Database Project: (Data Portal?)
2004-11-13 Supersite Relational Database Project: (Data Portal?)
 
Srds Pres011120
Srds Pres011120Srds Pres011120
Srds Pres011120
 
Implementation of OGC Web Coverage Service Using HDF5/HDF-EOS5 as the Base Fi...
Implementation of OGC Web Coverage Service Using HDF5/HDF-EOS5 as the Base Fi...Implementation of OGC Web Coverage Service Using HDF5/HDF-EOS5 as the Base Fi...
Implementation of OGC Web Coverage Service Using HDF5/HDF-EOS5 as the Base Fi...
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
SEEDS Standards Process
SEEDS Standards ProcessSEEDS Standards Process
SEEDS Standards Process
 
Subsetting
SubsettingSubsetting
Subsetting
 
GSM UMTS LTE Site Commissioning software
GSM UMTS LTE Site Commissioning softwareGSM UMTS LTE Site Commissioning software
GSM UMTS LTE Site Commissioning software
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
 

Más de The HDF-EOS Tools and Information Center

Más de The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Subsetting

  • 1. SUBSETTING Matt Smith Information Technology and Systems Center (ITSC) University of Alabama in Huntsville (UAH) http://subset.itsc.uah.edu UAH The University of Alabama in Huntsville
  • 2. Subsetting • Goal: to provide a science data user with only the data they request as quickly as possible. • Benefits science data users and data centers: - reduces analysis time by reducing amount of data - reduces time for data delivery - reduces resources (network, personnel, media, etc.) • Steps: - locate spatial / temporal / spectral area of interest - extract - re-assemble for distribution 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 3. HEW • HDF-EOS Web-based Subsetter • Prototype software designed to be dataset-independent (HDF-EOS) • Front-end/GUI • Uses HTML forms and JavaScript • Optional • Back-end • Needs subset criteria file and HDF-EOS data • Performs subsetting as a “batch” job • http://subset.itsc.uah.edu/hew2k 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 4.
  • 5. Subset Criteria File • • • • File(s) to subset Parameters/channels E-mail address Bounding box Req’d Req’d Req’d Opt. • Latitude/Longitude bounds • Row/Column bounds (grids only) • • • • • • Time range Subsampling stride Non-geolocated objects (also_include) Output file prefix .met file Sub-scan subsetting (swaths only) 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD Opt. Req’d Opt. Opt. Opt. Opt. UAH The University of Alabama in Huntsville
  • 6. Example Subset Criteria File GROUP = SUBSET PARENT_FILE =(“/AQUA/AMSR/AE_L2A.hdfeos”) LATITUDE_RANGE = (35.000000, 40.000000) LONGITUDE_RANGE = (-77.000000, -72.000000) EMAIL = “user@company.com” OUTPUT_PREFIX = “NC_coast” MET_FILE = “YES” GROUP = SPOG NAME = “swath_1” TYPE = “SWATH” PARAMETERS = “89.0V_Res.1_TB”, “89.0V_Res.2_TB”) SUBSAMPLING = (“GeoTrack”, 1, “GeoXtrack”, 1) END_GROUP = SPOG END_GROUP = SUBSET END 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 7. HEW Back-end • • • • • • Uses HDF-EOS (and HDF) library Instructions via a subset criteria file (ODL) Handles multiple similar files Handles Swath and/or Grid objects Unix (SGI & Sun) executables available Subsetted output files contain: • • • • StructMetadata (HDF-EOS) ArchiveMetadata* ProductMetadata (added by HEW ODL file) CoreMetadata* (w/ modified bounding box & time info) • optionally placed in . m e t file • 27-28 February 2002 * if present in parent file Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 8. HEW Subsettable data EOS DATASETS • Terra MODIS MOPITT ASTER • Aqua AMSR-E 27-28 February 2002 OTHERS TRMM TMI NOAA-15 AMSU-A Any other HDF-EOS2 (HDF4) data written with HDF-EOS library subsetting calls in mind Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 9. HEW integration with ECS EDG System 2 EDG Order submission (HTML) ECS ECS 1 End user 7 3 Output data (Reingested) 4 Data order and reply Subset ODL and reply 6 Output data Subsetter 5 Input data Subsetting System 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 10. ECS integration plans • • • • • UAH/ITSC-written interface software 6a.05 to be released in March NSIDC, GDAAC, EDC EDG v3.4 will have subsetting options Enhancements for DAACs 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 11. Subsetting web-site • http://www.subset.org • Hope to create “portal” • for everyone involved in subsetting • • • • • • • 27-28 February 2002 Advertising Forums Data Software Glossary Tutorials Links to specialized subsetters Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 12. Other HDF-EOS Tools • • • • SPOT – Subsettability Checker eospeek – HDF-EOS file display hdfpeek – HDF file display HDF-EOS user’s manual (in work) 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 13. Subsetting Plans • • • • • Complete ECS Integration • Maintain/Improve as needed for DAACs • Front-end polar projection coverage map (NSIDC) Certify software with new datasets (Aqua, Aura,…) Incorporate ESML usage Provide support for HDF-EOS5 Provide additional specialized subsetting applications for instrument teams and others 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 14. Earth Science Markup Language “Define Once, Use Anywhere” Information Technology and Systems Center University of Alabama in Huntsville http://esml.itsc.uah.edu Contact Info: Rahul Ramachandran rramachandran@itsc.uah.edu Research Effort Supported by: Karen Moe Earth Science Technology Office, NASA UAH The University of Alabama in Huntsville
  • 15. Data Characteristics • Different Data Formats • BUFR (DoD, WMO) • CDF, NetCDF • GRIB (WMO) • HDF, HDF-EOS (NASA) • Free formats (Binary, ASCII) • GRaDs, McIDAS, Pheonix, URF etc etc • Different states of processing • raw, calibrated, derived, modeled or interpreted • Data/application interoperability problem • Most scientists are not programmers • Writing data decoders takes time and effort! 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 16. Data/Application Interoperability Problem DATA DATA FORMAT 11 FORMAT DATA DATA FORMAT 22 FORMAT DATA DATA FORMAT 33 FORMAT FORMAT CONVERTER READER 1 READER 2 APPLICATION • Specialized code for every format • Difficult to assimilate new data types • Enforce a Standard Data Format • Not practical for legacy datasets 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 17. What is ESML? • Specialized markup language for Earth Science metadata based on XML • Machine-readable and -interpretable representation of the structure and content of any data file, regardless of data format • ESML is NOT a new data format • External metadata files that can be generated by either data producer or data consumer (at collection, data set, and/or granule level) • ESML will provide the benefits of a standard, self-describing data format (like HDF, HDF-EOS, netCDF, geoTIFF, …) without the cost of data conversion • ESML consists of three types of metadata: • Syntactic • Semantic • Content 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 18. Three types of metadata • Syntactic – Structural information, bits, word-length, endianness, sequence • Semantic – Meaning of the data, units, frame of reference • Content – “Typical” metadata, general information • Producer contact info, version – Searchable information, keywords, etc. 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 19. WMO Sounding data 4 4 5 5 6 4 6 6 6 6 4 5 6 6 5 6 5 5 6 6 4 6 5 6 4 10000 9250 8890 8830 8765 8500 8136 7840 7554 7279 7000 6910 6753 6499 6130 6017 5840 5630 5563 5348 5000 4940 4890 4740 4000 27-28 February 2002 94 99999 99999 99999 99999 757 99999 99999 99999 99999 1102 142 -28 99999 99999 1159 150 -20 99999 99999 1219 99999 99999 310 26 1468 130 -40 330 36 1828 99999 99999 320 46 2133 99999 99999 315 62 2438 99999 99999 310 67 2743 99999 99999 295 62 3065 16 -124 275 67 3169 20 -170 99999 99999 3352 99999 99999 270 93 3657 99999 99999 270 118 4121 -39 -199 99999 99999 4267 99999 99999 255 93 4500 -73 -193 99999 99999 4784 -87 -297 99999 99999 4876 99999 99999 240 87 5181 99999 99999 230 103 5700 -161 -241 235 139 5791 99999 99999 235 139 5867 -173 -243 99999 99999 6096 99999 99999 235 129 7340 -287 -367 245 108 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 20. Ex: ESML for WMO Sounding Data <a:ESML xmlns:a="ESML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ESML R:SchemaESML.xsd"> <SyntacticMetaData> <Ascii> <AsciiStructure name="DataTable" geoInfo="NoGeoInfo" instances="1"> <Array occurs=“*"> <Field format="%d" name="number"> <Attribute/> </Field> <Field format="%d" name="PRESSURE"> <Data unit="mb" equation="X/10“ FillValue=“99999”/> </Field> <Field format="%d" name="HEIGHT"> <Data unit=“m”/> </Field> <Field format="%d" name="TEMPERATURE"> <Data unit=“K” equation="X/10/> </Field> <Field format="%d" name="DEWPOINT"> <Data unit=“K” equation="X/10/> </Field> <Field format="%d" name="WIND DIRECTION"> <Data unit=“DDD”/> </Field> <Field format="%d" name="WIND SPEED"> <Data unit=“m/s”/> </Field> </Array> </AsciiStructure> </Ascii> </SyntacticMetaData> </a:ESML> 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 21. ESML Schema/Library • ESML Schema Version 0.5 • Supports ASCII, Binary and HDF-EOS • W3C compliant schema • Extensible allowing new formats or modifications • ESML Library Version 0.5 • C++ version • Windows 95/98/NT/2000 version • Porting to LINUX version • Changed from Oracle XML parser to Apache XML parser 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 22. Future Plans • Addition of new data formats to both Schema and Library • GRIB • McIDAS • BUFR, HDF4/5 and others • Versions of the Library • C/C++ version – UNIX • C/C++ version – Mac • Java version 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 23. ESML Data Browser • Hybrid product (Java Client/C++ Library backend/JNI Interface) • Prototype version is the extension of the ESML Demo Tool • Features • Browse and view data values using an ESML file • Browse the metadata for each data field • Future Features: • Allow format conversion with automatic generation of ESML metadata file • Allow selection of multiple fields • Additional functionality such as Subsetting • Browse data images 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 24. ESML Editor • 100% Java version prototype • Unable to find a COTS editor • Utilizes Expert System principles to give users correct options • Hides XML tags from the users • Future Features: • Allow text editing of the XML tags also • Incorporate feedback from users 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 25. ESML Web Page • URL: esml.itsc.uah.edu • Post latest products, news, presentations, papers • Schema and related documents available to all • Beta version of the Library available on limited basis • Beta versions of ESML Editor and ESML Data Browser will be made available soon 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 26. Other UAH/ITSC work • • • • AMSR-E Passive Microwave (PM)-ESIP ADaM – Algorithm Development and Mining EVE – An EnVironmEnt for On-board Processing 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 27. Contact UAH/ITSC HEW (HDF-EOS Web-based Subsetter) http://subset.itsc.uah.edu/hew2k ESML (Earth Science Markup Language) http://esml.itsc.uah.edu General Purpose Subsetting (ADaM) http://datamining.itsc.uah.edu On-Demand Subsetting (Passive Microwave – ESIP) http://pm-esip.msfc.nasa.gov SSM/I Coarse-grain Subsetting http://ghrc.msfc.nasa.gov/ssmi/ssmi_subset.html 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville