SlideShare a Scribd company logo
1 of 25
NASA Terra Data Fusion
Kent Yang
The HDF Group
HDF Town Hall
July 20, 2018
July 20th, 2018 HDF TownHall 1
July 20th, 2018 HDF TownHall 2
The Terra Data Fusion Project Team
Department of Atmospheric Sciences, University of Illinois
Larry Di Girolamo Guangyu Zhao Yizhe Zhan Landon Clipp Shashank Bansal
Yat Long Lo Dongwei Fu Brandon Chen
Department of Geography and GIS, University of Illinois
Shaowen Wang Yan Liu Yizhao Gao
The HDF Group
MuQun (Kent) Yang H Joe Lee
National Center for Supercomputing Applications, University of Illinois
John Towns Kandace Turner Michelle Butler Sean Stevens David Ralia
Jonathan Kim Donna Cox Stuart Levy Robert Patterson Andrew Christiensen
Department of Atmospheric Sciences, Texas A&M
Ping Yang Hioki Souichiro Yi Wang
NASA Langely/SSAI
Lusheng Liang
NASA Goddard Space Flight Center
Ralph Kahn Jim Limbacher
EOS Terra
• Flagship mission
• Launched in 1999
• Projection ends in 2022
• Longest single satellite
climate record
• One of the most popular
Earth Science satellite
data
• Five instruments
 ASTER,CERES,
MISR,MODIS,MOPITT
July 20th, 2018 3HDF TownHall
Credits: NASA
Terra, in 2015 alone…
More than 360 million files…
Totaling more than 3.4 PB data…
Delivered to more than 100,000 users around the world.
More than 1,800 peer-reviewed publications (over 15,000 to date)
Results from Terra cited more than 49,000 times (over 250K to date)
“The high publication rate includes an increasing number of papers
capitalizing on fusion of data among Terra sensors”
–
NASA Senior Review 2017: Terra
Terra Data Fusion Project
• Fuse existing Level 1B Terra radiance
products from all Terra 5 instruments into
one product
July 20th, 2018 5HDF TownHall
Why Terra Data Fusion?
• Scientific value added by data fusion of its
five instruments.
• A key recommendation from the 2007
NRC Decadal Survey on Earth Science
and Application from Space:
“…experts should... focus on providing
comprehensive data sets that combine
measuremements from multiple sensors.”
July 20th, 2018 6HDF TownHall
Challenges for Terra Data Fusion
• Huge data volumes
 1 PB input data from year 2000 to 2015
Need adequate cyberinfrastructure to tackle
• Input data residing at different locations
Need to transfer huge data volumes
July 20th, 2018 7HDF TownHall
Solutions
• NCSA supercomputer clusters
 Blue Waters and other clusters were used for
Terra Data fusion
 NCSA nearline tape archive system is used to
store the input and fusion data
• NCSA experts helped transfer the huge input
data to NCSA supercomputer facilities
July 20th, 2018 8HDF TownHall
NCSA Blue Waters
More Challenges
July 20th, 2018 9HDF TownHall
• Input data
 Different granularities
 Different methods to store radiance and geo-
location data
 Different file formats
• Complicate fusion file organization
• Metadata conventions need to catch up
Overcoming these challenges is what
The HDF Group contributed the most!
Different Instrument Granularities
• Map granules from different instruments to a
common granule that contains data for a single
Terra orbit.
 Contain multiple MODIS and ASTER input
granules
 Subset CERES and MOPITT input granules
July 20th, 2018 10HDF TownHall
Different Methods to Store Data
• Unpacking MODIS, ASTER and MISR radiation
data to physical units
 Need to unpack the data by following the specific
packing schemes of individual instruments
• Interpolating MODIS, ASTER and MISR
geolocation data to native radiance resolution
 Need to handle each instrument differently
July 20th, 2018 11HDF TownHall
Different File Formats of Input Granules
• All converted to HDF5 file format
 From HDF4, HDF-EOS2 and HDF-EOS5
• Also netCDF-4 compatible
 Following netCDF-4 enhanced data model
July 20th, 2018 12HDF TownHall
Complicate Fusion file organization
• Use HDF5 group structure to organize different
instruments and different input granules
 Each instrument represented by one group
 Each input granule stored as the subgroup of the
instrument group
July 20th, 2018 13HDF TownHall
Metadata Conventions Catch-up
• Make the fusion HDF5 file follow CF conventions
by adding key CF attributes
 Units
 Coordinates
 _FillValue
 Valid_min
 Valid_max
July 20th, 2018 14HDF TownHall
More usage of HDF5 features
• HDF5 chunking and compression are used to
reduce the total fusion file size.
July 20th, 2018 15HDF TownHall
Fusion File Statistics
• About 1 million input files.
• 84,303 files – from Feb 25 2000 to Dec 31 2015
• The total file size is 2.3 petabytes.
• Typical file sizes 15GB – 40GB. The largest file
size is 68.7GB. Average file size is 26GB.
• HDF5 in-memory compression reduces the total
file size by 60%.
July 20th, 2018 16HDF TownHall
Fusion File Statistics
July 20th, 2018 17HDF TownHall
Fusion HDF5 File Layout in HDFView
July 20th, 2018 18HDF TownHall
Fusion HDF5 File Layout in CDL
July 20th, 2018 19HDF TownHall
Fusion file visualized in Panoply
July 20th, 2018 20HDF TownHall
Other Work
• Validate the generated data to ensure the high
quality fusion product
• Implemented the advanced fusion resampling
and reprojection tool
 Resample / reproject the radiance fields for one
Terra instrument onto the grids used by another
Terra instrument
• Generated the NASA CMR-compliant fusion
Collection and granule metadata in ECHO 10
XML format
July 20th, 2018 21HDF TownHall
• Fusion data visualization demo
July 20th, 2018 22HDF TownHall
Thank You!
July 20th, 2018 HDF TownHall 23
Acknowledgements
This work was supported by NASA ACCESS Grant
#NNX16AM07A.
Any opinions, findings, conclusions, or
recommendations expressed in this material are
those of the author[s] and do not necessarily
reflect the views of NASA.
July 20th, 2018 24HDF TownHall
Questions/comments?
July 20th, 2018 HDF TownHall 25

More Related Content

What's hot

Improving long-term preservation of EOS data by independently mapping HDF4 da...
Improving long-term preservation of EOS data by independently mapping HDF4 da...Improving long-term preservation of EOS data by independently mapping HDF4 da...
Improving long-term preservation of EOS data by independently mapping HDF4 da...The HDF-EOS Tools and Information Center
 

What's hot (20)

MATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and CapabilitiesMATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and Capabilities
 
Pilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOTPilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOT
 
Incorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product DesignerIncorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product Designer
 
Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4
 
Hierarchical Data Formats (HDF) Update
Hierarchical Data Formats (HDF) UpdateHierarchical Data Formats (HDF) Update
Hierarchical Data Formats (HDF) Update
 
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
 
Scientific Computing and Visualization using HDF
Scientific Computing and Visualization using HDFScientific Computing and Visualization using HDF
Scientific Computing and Visualization using HDF
 
HDF Group Support for NPP/NPOESS/JPSS
HDF Group Support for NPP/NPOESS/JPSSHDF Group Support for NPP/NPOESS/JPSS
HDF Group Support for NPP/NPOESS/JPSS
 
HDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve InteroperabilityHDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve Interoperability
 
HDF Town Hall
HDF Town HallHDF Town Hall
HDF Town Hall
 
Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5
 
Bridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data ProductsBridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data Products
 
ICESat-2 Metadata and Status
ICESat-2 Metadata and StatusICESat-2 Metadata and Status
ICESat-2 Metadata and Status
 
Improving long-term preservation of EOS data by independently mapping HDF4 da...
Improving long-term preservation of EOS data by independently mapping HDF4 da...Improving long-term preservation of EOS data by independently mapping HDF4 da...
Improving long-term preservation of EOS data by independently mapping HDF4 da...
 
Utilizing HDF4 File Content Maps for the Cloud Computing
Utilizing HDF4 File Content Maps for the Cloud ComputingUtilizing HDF4 File Content Maps for the Cloud Computing
Utilizing HDF4 File Content Maps for the Cloud Computing
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
Matlab, Big Data, and HDF Server
Matlab, Big Data, and HDF ServerMatlab, Big Data, and HDF Server
Matlab, Big Data, and HDF Server
 
HDF and netCDF Data Support in ArcGIS
HDF and netCDF Data Support in ArcGISHDF and netCDF Data Support in ArcGIS
HDF and netCDF Data Support in ArcGIS
 
NEON HDF5
NEON HDF5NEON HDF5
NEON HDF5
 
Product Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the WebProduct Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the Web
 

Similar to NASA Terra Data Fusion

Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudGlobus
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...datacite
 
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...Kerstin Lehnert
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionIan Foster
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...Martin Klein
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...openminted_eu
 
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:RdaNordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:RdaNordForsk
 
Geospatial metadata and spatial data workshop: 19 June 2014
Geospatial metadata and spatial data workshop: 19 June 2014Geospatial metadata and spatial data workshop: 19 June 2014
Geospatial metadata and spatial data workshop: 19 June 2014EDINA, University of Edinburgh
 
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!EDINA, University of Edinburgh
 
Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617EDINA, University of Edinburgh
 
Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020GEO Analytics Canada
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
GLOSIS: Progress and plans
GLOSIS: Progress and plansGLOSIS: Progress and plans
GLOSIS: Progress and plansExternalEvents
 
Big Data and its Role in Biomedical Research
Big Data and its Role in Biomedical ResearchBig Data and its Role in Biomedical Research
Big Data and its Role in Biomedical ResearchPhilip Bourne
 
Cambridge University Geospatial Metadata Workshop 20110524
Cambridge University Geospatial Metadata Workshop 20110524Cambridge University Geospatial Metadata Workshop 20110524
Cambridge University Geospatial Metadata Workshop 20110524EDINA, University of Edinburgh
 

Similar to NASA Terra Data Fusion (20)

HDF5 and The HDF Group
HDF5 and The HDF GroupHDF5 and The HDF Group
HDF5 and The HDF Group
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
 
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...
 
Geoscience Data Analysis and Visualization Tools from NCAR
Geoscience Data Analysis and Visualization Tools from NCARGeoscience Data Analysis and Visualization Tools from NCAR
Geoscience Data Analysis and Visualization Tools from NCAR
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
 
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:RdaNordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
 
Geospatial metadata and spatial data workshop: 19 June 2014
Geospatial metadata and spatial data workshop: 19 June 2014Geospatial metadata and spatial data workshop: 19 June 2014
Geospatial metadata and spatial data workshop: 19 June 2014
 
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!
 
Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
GLOSIS: Progress and plans
GLOSIS: Progress and plansGLOSIS: Progress and plans
GLOSIS: Progress and plans
 
Big Data and its Role in Biomedical Research
Big Data and its Role in Biomedical ResearchBig Data and its Role in Biomedical Research
Big Data and its Role in Biomedical Research
 
Cambridge University Geospatial Metadata Workshop 20110524
Cambridge University Geospatial Metadata Workshop 20110524Cambridge University Geospatial Metadata Workshop 20110524
Cambridge University Geospatial Metadata Workshop 20110524
 
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
 

More from The HDF-EOS Tools and Information Center

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 

More from The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
 

Recently uploaded

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 

Recently uploaded (20)

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 

NASA Terra Data Fusion

  • 1. NASA Terra Data Fusion Kent Yang The HDF Group HDF Town Hall July 20, 2018 July 20th, 2018 HDF TownHall 1
  • 2. July 20th, 2018 HDF TownHall 2 The Terra Data Fusion Project Team Department of Atmospheric Sciences, University of Illinois Larry Di Girolamo Guangyu Zhao Yizhe Zhan Landon Clipp Shashank Bansal Yat Long Lo Dongwei Fu Brandon Chen Department of Geography and GIS, University of Illinois Shaowen Wang Yan Liu Yizhao Gao The HDF Group MuQun (Kent) Yang H Joe Lee National Center for Supercomputing Applications, University of Illinois John Towns Kandace Turner Michelle Butler Sean Stevens David Ralia Jonathan Kim Donna Cox Stuart Levy Robert Patterson Andrew Christiensen Department of Atmospheric Sciences, Texas A&M Ping Yang Hioki Souichiro Yi Wang NASA Langely/SSAI Lusheng Liang NASA Goddard Space Flight Center Ralph Kahn Jim Limbacher
  • 3. EOS Terra • Flagship mission • Launched in 1999 • Projection ends in 2022 • Longest single satellite climate record • One of the most popular Earth Science satellite data • Five instruments  ASTER,CERES, MISR,MODIS,MOPITT July 20th, 2018 3HDF TownHall Credits: NASA
  • 4. Terra, in 2015 alone… More than 360 million files… Totaling more than 3.4 PB data… Delivered to more than 100,000 users around the world. More than 1,800 peer-reviewed publications (over 15,000 to date) Results from Terra cited more than 49,000 times (over 250K to date) “The high publication rate includes an increasing number of papers capitalizing on fusion of data among Terra sensors” – NASA Senior Review 2017: Terra
  • 5. Terra Data Fusion Project • Fuse existing Level 1B Terra radiance products from all Terra 5 instruments into one product July 20th, 2018 5HDF TownHall
  • 6. Why Terra Data Fusion? • Scientific value added by data fusion of its five instruments. • A key recommendation from the 2007 NRC Decadal Survey on Earth Science and Application from Space: “…experts should... focus on providing comprehensive data sets that combine measuremements from multiple sensors.” July 20th, 2018 6HDF TownHall
  • 7. Challenges for Terra Data Fusion • Huge data volumes  1 PB input data from year 2000 to 2015 Need adequate cyberinfrastructure to tackle • Input data residing at different locations Need to transfer huge data volumes July 20th, 2018 7HDF TownHall
  • 8. Solutions • NCSA supercomputer clusters  Blue Waters and other clusters were used for Terra Data fusion  NCSA nearline tape archive system is used to store the input and fusion data • NCSA experts helped transfer the huge input data to NCSA supercomputer facilities July 20th, 2018 8HDF TownHall NCSA Blue Waters
  • 9. More Challenges July 20th, 2018 9HDF TownHall • Input data  Different granularities  Different methods to store radiance and geo- location data  Different file formats • Complicate fusion file organization • Metadata conventions need to catch up Overcoming these challenges is what The HDF Group contributed the most!
  • 10. Different Instrument Granularities • Map granules from different instruments to a common granule that contains data for a single Terra orbit.  Contain multiple MODIS and ASTER input granules  Subset CERES and MOPITT input granules July 20th, 2018 10HDF TownHall
  • 11. Different Methods to Store Data • Unpacking MODIS, ASTER and MISR radiation data to physical units  Need to unpack the data by following the specific packing schemes of individual instruments • Interpolating MODIS, ASTER and MISR geolocation data to native radiance resolution  Need to handle each instrument differently July 20th, 2018 11HDF TownHall
  • 12. Different File Formats of Input Granules • All converted to HDF5 file format  From HDF4, HDF-EOS2 and HDF-EOS5 • Also netCDF-4 compatible  Following netCDF-4 enhanced data model July 20th, 2018 12HDF TownHall
  • 13. Complicate Fusion file organization • Use HDF5 group structure to organize different instruments and different input granules  Each instrument represented by one group  Each input granule stored as the subgroup of the instrument group July 20th, 2018 13HDF TownHall
  • 14. Metadata Conventions Catch-up • Make the fusion HDF5 file follow CF conventions by adding key CF attributes  Units  Coordinates  _FillValue  Valid_min  Valid_max July 20th, 2018 14HDF TownHall
  • 15. More usage of HDF5 features • HDF5 chunking and compression are used to reduce the total fusion file size. July 20th, 2018 15HDF TownHall
  • 16. Fusion File Statistics • About 1 million input files. • 84,303 files – from Feb 25 2000 to Dec 31 2015 • The total file size is 2.3 petabytes. • Typical file sizes 15GB – 40GB. The largest file size is 68.7GB. Average file size is 26GB. • HDF5 in-memory compression reduces the total file size by 60%. July 20th, 2018 16HDF TownHall
  • 17. Fusion File Statistics July 20th, 2018 17HDF TownHall
  • 18. Fusion HDF5 File Layout in HDFView July 20th, 2018 18HDF TownHall
  • 19. Fusion HDF5 File Layout in CDL July 20th, 2018 19HDF TownHall
  • 20. Fusion file visualized in Panoply July 20th, 2018 20HDF TownHall
  • 21. Other Work • Validate the generated data to ensure the high quality fusion product • Implemented the advanced fusion resampling and reprojection tool  Resample / reproject the radiance fields for one Terra instrument onto the grids used by another Terra instrument • Generated the NASA CMR-compliant fusion Collection and granule metadata in ECHO 10 XML format July 20th, 2018 21HDF TownHall
  • 22. • Fusion data visualization demo July 20th, 2018 22HDF TownHall
  • 23. Thank You! July 20th, 2018 HDF TownHall 23
  • 24. Acknowledgements This work was supported by NASA ACCESS Grant #NNX16AM07A. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author[s] and do not necessarily reflect the views of NASA. July 20th, 2018 24HDF TownHall