SlideShare una empresa de Scribd logo
1 de 33
Overview of the
Data Processing and Error
Analysis System (DPEAS)
Andrew S. Jones
Colorado State University (CSU)
Cooperative Institute for Research in the Atmosphere (CIRA)
DOD Center for Geosciences / Atmospheric Research (CG/AR)
Fort Collins, CO

DOD Center for Geosciences / Atmospheric Research

Colorado State University
What is it?



Data processing system for “large” data analysis
tasks using common PCs
Features:


2nd generation system (replaces an earlier system called
PORTAL (Jones et al., 1995))






Parallel implementation
Web-based documentation and monitoring
Incorporates a Fortran-interpreter for input tasks
Virtualized I/O subsystem (only memory-resident data

structures are needed, data algorithms now function like a model)






Able to failover to redundant hardware
Extensible User Module

Error Analysis code is still under development
Implemented on Windows NT/2000 OS

DOD Center for Geosciences / Atmospheric Research

Colorado State University
What Does it Do?



Global merge capabilities for numerous data sets
Current system in operational use for 2+ years at CIRA





Simplifies






Current average operational throughput rates using 15
processors on 8 PCs is 17 TB/yr (47 GB/day).
Measured max. throughput rate is: 2.5 PB/yr (7.1 TB/day)
Powerful abstraction layers allow anyone to write parallel code
Virtual I/O subsystem reduces end-user code complexities
Users interact using a language most already know

Easily Scales




Limited process “cross-talk” improves scaling behavior
Tests have shown that a 2000 machine cluster is physically
feasible.
Basically… just add hardware.

DOD Center for Geosciences / Atmospheric Research

Colorado State University
10 Data Types Are Currently Supported
 Reads

and Writes HDF-EOS natively
 GOES IMAGER (McIDAS)
 NOAA AVHRR GAC and LAC (McIDAS)
 NOAA AMSU-A and B (HDF-EOS)
 DMSP SSM/I (Byte Stream)
 DMSP SSM/T-2 (NGDC OIS)
 DMSP OLS (NGDC OIS)
 TRMM TMI and VIRS (HDF)
 User extensible… (your format here)
DOD Center for Geosciences / Atmospheric Research

Colorado State University
The Hardware

STORAGE VIEW

Legend
Primary
Backup
Wn Worker

Mirrored
Set
Primary

Backup

W1
66 GB

240 GB
240 GB
PROCESSOR VIEW

W2
240 GB

ClusterSummary
- All Ingest Processes
- Most Higher Level
Remapped Products
Primary

Backup

W1

W2

W3

OPERATIONAL CLUSTER (24/7)

9 Processors
3.0 GFlops
2.25 GB RAM

ClusterSummary
- Large Global Sectors
W4

W5

W6

EXPERIMENTAL CLUSTER (nights only/7)
DOD Center for Geosciences / Atmospheric Research

6 Processors
2.5 GFlops
2.5 GB RAM
Colorado State University
Failover Mode

STORAGE VIEW

X

Legend
Primary
Backup
Wn Worker

Mirrored
Set

Primary

Backup

W1
66 GB

240 GB
240 GB
PROCESSOR VIEW

W2
240 GB

Failover Steps:

X
Primary

Automated
1. Synchronize states
2. Promote the Backup
Backup

W1

W2

W3

OPERATIONAL CLUSTER (24/7)

W4

W5

Restore Steps:
Manually initiated
1. Demote the Backup
2. Restore Mirror Set
3. Synchronize states
4. Reactivate Primary

W6

EXPERIMENTAL CLUSTER (nights only/7)
DOD Center for Geosciences / Atmospheric Research

Colorado State University
Module Context
GUIs

Batch Job Client

Explorer

Command Line

Web Browser

Command Line Script
Command Shell Interpreter
DPEAS Input Script

Other Applications

DPEAS Data Processing Engine
Spawn Subtask

DPEAS Subtask

DPEAS Fortran Interpreter

Batch Job
Service

Analysis Modules

DPEAS
System
State

User Modules

DPEAS HDF-EOS
Virtual I/O Subsystem
Translation
Modules

Output
Modules

This is
DPEAS
Internet Information
Services

Operating System (Windows 2000)

DOD Center for Geosciences / Atmospheric Research

Colorado State University
An example of a DPEAS input script file

DOD Center for Geosciences / Atmospheric Research

Colorado State University
How DPEAS Starts
Program Start
DPEAS Initialization
Interpreting DPEAS script
declarations
Interpreting DPEAS script
executable statements

DOD Center for Geosciences / Atmospheric Research

Colorado State University
How DPEAS Ends
Interpreting DPEAS script
executable statements

DPEAS Summary

Program End

DOD Center for Geosciences / Atmospheric Research

Colorado State University
How Are Spawned Input
Scripts and Jobs Created?




All spawned DPEAS jobs run machine-generated
DPEAS input scripts which are generated by the data
processing engine from the Master DPEAS input
script (The examples shown previously were
examples of DPEAS machine-generated code)
This is automated within DPEAS and the user code
goes along for the free ride since it is part of the
DPEAS executable (it’s like meeting a friendly virus
which helps to spread your code along with it)

DOD Center for Geosciences / Atmospheric Research

Colorado State University
What Does DPEAS
Parallelism Look Like?
Do loop contents
are sent to other
resources in parallel
The new jobs run the
same “DPEAS.exe”,
but execute only the
subtask operations
Completed Jobs
allow additional jobs
to start

DOD Center for Geosciences / Atmospheric Research

Colorado State University
The 3 Programming Steps to
Add a User Routine to DPEAS
1.

Insert a program “hook”
The program hook makes the main DPEAS program
aware of the existence of your wrapper routine.

2.

Create a wrapper routine
The wrapper routine tells the DPEAS fortran
interpreter how to parse and interact with your
application subroutine arguments.

3.

Create an application routine
The application routine performs the “real” work.
You can do anything you want within the application
routine.

DOD Center for Geosciences / Atmospheric Research

Colorado State University
How does the “User_Module.f90”
relate to my DPEAS Input Scripts?
Compile
User_Module.f90
Program Hook
Wrapper Routine
Application Routine

Ordinary
Fortran Compiler

Interpret

Automated
Parallelization

DPEAS Input
Script

Using Self-Replication

"DPEAS.exe"

DPEAS Input
Script
Subtask

Interprets DPEAS
Input Script

"DPEAS.exe"
Interprets DPEAS
Input Script
Return to
Master

End
DOD Center for Geosciences / Atmospheric Research

Colorado State University
User Example:
The user’s application routine
Using the virtual I/O data via pointers
1. Find each
MW channel
2. Allocate a new
output array
data structure
Your science code
looks like this

DOD Center for Geosciences / Atmospheric Research

Colorado State University
User Example:
The results: Complete integration

The new user
routine is now
fully integrated
into DPEAS

DOD Center for Geosciences / Atmospheric Research

Colorado State University
User Example:
The output HDF-EOS file

DOD Center for Geosciences / Atmospheric Research

Colorado State University
User Example:
The output image representation

150 GHz
Effective
Emissivity
Calculated from:
GOES-08 IMAGER
NOAA-15 AMSU-B

DOD Center for Geosciences / Atmospheric Research

Colorado State University
User Example:
Summary
 Creates

2 new routines:

Wrapper routine
 Application routine


 Requires

25 lines of executable code:

2 – Program hook
Small overhead for gaining massive
parallelism capabilities!
 4 – Wrapper routine
 19 – Application routine






2 – Variable assignments
3 – Science algorithm
14 – Virtual I/O library calls
(using only 2 Virtual I/O library routines)

DOD Center for Geosciences / Atmospheric Research

Colorado State University
User Example:
How complex would the user routine be,
if written without the Virtual I/O library?


Creates 2 new routines:





Wrapper routine
Application routine

Requires 59 lines of executable code:




2 – Program hook
4 – Wrapper routine
53 – Application routine




2 – Variable assignments
3 – Science algorithm
48 – HDF-EOS library calls
(using 26 HDF-EOS library
routines)

DOD Center for Geosciences / Atmospheric Research

Answer: Without the
DPEAS Virtual I/O library
there would be:
24 additional I/O routines
called by the user (+1200%)
34 additional lines of user
code (+236%)
Colorado State University
User Example:
Conclusions


Implementation Insights





Virtual I/O Insights




Minimal amount of end-user code is required
The effort and resources involved are small
(The DPEAS program recompiled in < 30 s on the user’s desktop)
The DPEAS virtual I/O access method is less complex than
traditional HDF-EOS file access methods

End user’s perspective





End users are protected from technical data format issues
End users can develop higher quality code by leveraging
shared robust common modules
Scalability is greatly enhanced with little end user effort

DOD Center for Geosciences / Atmospheric Research

Colorado State University
Summary








DPEAS can process large data sets in an efficient
manner while maintaining centralized management
controls and error handling behaviors
Parallelism of the code is automatic and runs on
“cheap hardware”
Failover capabilities make the system more robust
User code is shielded from complexities of the
system using software abstraction layers
Little training is needed since user interfaces are in
a known scientific language
User modules directly access data from memory –
obsolesces traditional file access methods but
maintains needed file compatibility

DOD Center for Geosciences / Atmospheric Research

Colorado State University
What did I learn about
HDF-EOS in the process?





HDF-EOS is an excellent “universal” data format
It works for all satellite sensors types I have
encountered to date (10+)
HDF-EOS requires serious software design before
the implementation stage
It is my experience that “Time” information as a
geo/time field for sectorizing is overrated and is likely
to cause future software design headaches with the
more complex sensors if encouraged to be the
“norm”

DOD Center for Geosciences / Atmospheric Research

Colorado State University
My 2 cents: How HDF-EOS
could be made even better
(Hopefully someone has already thought of these things,
and this short list will be a reaffirmation)
 Given that GOES data, for example, and other





multi-detector sensors can have multiple times for
each channel for the same geolocation position,
and that in addition, they can and do interrupt their
sensor scans at any time…
Treat “Time” as a data attribute
Currently I associate “Time” and other associated
arrays with its principle data array by nomenclature
It would be better to use data array attribute
“groups”. Then “Time”, “Calibration”, and other
associated arrays could be grouped with the data
array through the data format.

DOD Center for Geosciences / Atmospheric Research

Colorado State University
Why Data Attributes?


Many data channels have “associated” information







For example, it might be very meaningful to associate the
min. and max. of a grid location with its mean value

It would be better if there was a standard way of
showing that group association, so we don’t have
to understand each other’s unique nomenclatures,
“intent”, or have to resort to the use of unusual
“mixed” HDF/HDF-EOS data files
Data attributes should not be arbitrarily limited in
scope, but have full data type ranges
Units could also be incorporated through data
attributes

DOD Center for Geosciences / Atmospheric Research

Colorado State University
The End
jones@cira.colostate.edu

DOD Center for Geosciences / Atmospheric Research

Colorado State University
Appendix
The following series of slides show how a
user can easily modify DPEAS
1.

The user’s program hook

2.

… wrapper routine

3.

… application routine
(using the virtual I/O data via pointers)

4.
5.

Usage of the new user routine in a
DPEAS input script file
The Results: Complete Integration

DOD Center for Geosciences / Atmospheric Research

Colorado State University
User Example:
The user’s program hook

2 lines of code

DOD Center for Geosciences / Atmospheric Research

Colorado State University
User Example:
The user’s wrapper routine

4 lines of executable code

DOD Center for Geosciences / Atmospheric Research

Colorado State University
User Example:
The user’s application routine
Using the virtual I/O data via pointers
1. Find each
MW channel
2. Allocate a new
output array
data structure
Your science code
looks like this

DOD Center for Geosciences / Atmospheric Research

Colorado State University
User Example:
Usage of the new user routine in a
DPEAS input script file

DOD Center for Geosciences / Atmospheric Research

Colorado State University
User Example:
The results: Complete integration

The new user
routine is now
fully integrated
into DPEAS

DOD Center for Geosciences / Atmospheric Research

Colorado State University
Where Do I Find DPEAS?
DPEAS Home Page:
http://luna.cira.colostate.edu/DPEAS/DPEAS_frame.htm
Please direct questions to jones@cira.colostate.edu

DOD Center for Geosciences / Atmospheric Research

Colorado State University

Más contenido relacionado

Destacado

Retention and attrition February 2013
Retention and attrition February 2013Retention and attrition February 2013
Retention and attrition February 2013Timothy Holden
 
Data Entry documents
Data Entry documentsData Entry documents
Data Entry documentsFELIXINBARAJ
 
Maximizing retention and minimizing attrition April 2012
Maximizing retention and minimizing attrition April 2012Maximizing retention and minimizing attrition April 2012
Maximizing retention and minimizing attrition April 2012Timothy Holden
 
Analysis of Attrition &amp; Employee engament activity at IDEA Cellular ltd
Analysis of Attrition &amp; Employee engament activity at IDEA Cellular ltdAnalysis of Attrition &amp; Employee engament activity at IDEA Cellular ltd
Analysis of Attrition &amp; Employee engament activity at IDEA Cellular ltdalpana96
 
Cutting costs, improving quality, & speeding delivery through continous impro...
Cutting costs, improving quality, & speeding delivery through continous impro...Cutting costs, improving quality, & speeding delivery through continous impro...
Cutting costs, improving quality, & speeding delivery through continous impro...Eng Marzouk
 
Employee Attrition
Employee AttritionEmployee Attrition
Employee AttritionVinay sattur
 

Destacado (9)

Retention and attrition February 2013
Retention and attrition February 2013Retention and attrition February 2013
Retention and attrition February 2013
 
Data Entry documents
Data Entry documentsData Entry documents
Data Entry documents
 
Maximizing retention and minimizing attrition April 2012
Maximizing retention and minimizing attrition April 2012Maximizing retention and minimizing attrition April 2012
Maximizing retention and minimizing attrition April 2012
 
Analysis of Attrition &amp; Employee engament activity at IDEA Cellular ltd
Analysis of Attrition &amp; Employee engament activity at IDEA Cellular ltdAnalysis of Attrition &amp; Employee engament activity at IDEA Cellular ltd
Analysis of Attrition &amp; Employee engament activity at IDEA Cellular ltd
 
Cutting costs, improving quality, & speeding delivery through continous impro...
Cutting costs, improving quality, & speeding delivery through continous impro...Cutting costs, improving quality, & speeding delivery through continous impro...
Cutting costs, improving quality, & speeding delivery through continous impro...
 
Attrition
AttritionAttrition
Attrition
 
Employee Attrition
Employee AttritionEmployee Attrition
Employee Attrition
 
Error analysis
Error  analysisError  analysis
Error analysis
 
attrition analysis
attrition analysisattrition analysis
attrition analysis
 

Similar a Overview of the Data Processing Error Analysis System (DPEAS)

seed block algorithm
seed block algorithmseed block algorithm
seed block algorithmDipak Badhe
 
Linux-Based Data Acquisition and Processing On Palmtop Computer
Linux-Based Data Acquisition and Processing On Palmtop ComputerLinux-Based Data Acquisition and Processing On Palmtop Computer
Linux-Based Data Acquisition and Processing On Palmtop ComputerIOSR Journals
 
Linux-Based Data Acquisition and Processing On Palmtop Computer
Linux-Based Data Acquisition and Processing On Palmtop ComputerLinux-Based Data Acquisition and Processing On Palmtop Computer
Linux-Based Data Acquisition and Processing On Palmtop ComputerIOSR Journals
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009lilyco
 
sector-sphere
sector-spheresector-sphere
sector-spherexlight
 
Sector Cloudcom Tutorial
Sector Cloudcom TutorialSector Cloudcom Tutorial
Sector Cloudcom Tutoriallilyco
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataRobert Grossman
 
Airborne Data Processing And Analysis Software Package
Airborne Data Processing And Analysis Software PackageAirborne Data Processing And Analysis Software Package
Airborne Data Processing And Analysis Software PackageJanelle Martinez
 
grid mining
grid mininggrid mining
grid miningARNOLD
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task ComputingEric Van Hensbergen
 
Ogce Workflow Suite
Ogce Workflow SuiteOgce Workflow Suite
Ogce Workflow Suitesmarru
 
A cloud environment for backup and data storage
A cloud environment for backup and data storageA cloud environment for backup and data storage
A cloud environment for backup and data storageIGEEKS TECHNOLOGIES
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Yahoo Developer Network
 
A cloud enviroment for backup and data storage
A cloud enviroment for backup and data storageA cloud enviroment for backup and data storage
A cloud enviroment for backup and data storageIGEEKS TECHNOLOGIES
 
Google File System
Google File SystemGoogle File System
Google File Systemvivatechijri
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesappaji intelhunt
 
Content Framework for Operational Environmental Remote Sensing Data Sets: NPO...
Content Framework for Operational Environmental Remote Sensing Data Sets: NPO...Content Framework for Operational Environmental Remote Sensing Data Sets: NPO...
Content Framework for Operational Environmental Remote Sensing Data Sets: NPO...The HDF-EOS Tools and Information Center
 
Effect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEffect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEric Van Hensbergen
 

Similar a Overview of the Data Processing Error Analysis System (DPEAS) (20)

seed block algorithm
seed block algorithmseed block algorithm
seed block algorithm
 
Linux-Based Data Acquisition and Processing On Palmtop Computer
Linux-Based Data Acquisition and Processing On Palmtop ComputerLinux-Based Data Acquisition and Processing On Palmtop Computer
Linux-Based Data Acquisition and Processing On Palmtop Computer
 
Linux-Based Data Acquisition and Processing On Palmtop Computer
Linux-Based Data Acquisition and Processing On Palmtop ComputerLinux-Based Data Acquisition and Processing On Palmtop Computer
Linux-Based Data Acquisition and Processing On Palmtop Computer
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009
 
sector-sphere
sector-spheresector-sphere
sector-sphere
 
Sector Cloudcom Tutorial
Sector Cloudcom TutorialSector Cloudcom Tutorial
Sector Cloudcom Tutorial
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big Data
 
Airborne Data Processing And Analysis Software Package
Airborne Data Processing And Analysis Software PackageAirborne Data Processing And Analysis Software Package
Airborne Data Processing And Analysis Software Package
 
grid mining
grid mininggrid mining
grid mining
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task Computing
 
Ogce Workflow Suite
Ogce Workflow SuiteOgce Workflow Suite
Ogce Workflow Suite
 
A cloud environment for backup and data storage
A cloud environment for backup and data storageA cloud environment for backup and data storage
A cloud environment for backup and data storage
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
 
A cloud enviroment for backup and data storage
A cloud enviroment for backup and data storageA cloud enviroment for backup and data storage
A cloud enviroment for backup and data storage
 
Google File System
Google File SystemGoogle File System
Google File System
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
 
Hadoop
HadoopHadoop
Hadoop
 
Content Framework for Operational Environmental Remote Sensing Data Sets: NPO...
Content Framework for Operational Environmental Remote Sensing Data Sets: NPO...Content Framework for Operational Environmental Remote Sensing Data Sets: NPO...
Content Framework for Operational Environmental Remote Sensing Data Sets: NPO...
 
Effect of Virtualization on OS Interference
Effect of Virtualization on OS InterferenceEffect of Virtualization on OS Interference
Effect of Virtualization on OS Interference
 
10 xrd-software
10 xrd-software10 xrd-software
10 xrd-software
 

Más de The HDF-EOS Tools and Information Center

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 

Más de The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 

Último

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Overview of the Data Processing Error Analysis System (DPEAS)

  • 1. Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S. Jones Colorado State University (CSU) Cooperative Institute for Research in the Atmosphere (CIRA) DOD Center for Geosciences / Atmospheric Research (CG/AR) Fort Collins, CO DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 2. What is it?   Data processing system for “large” data analysis tasks using common PCs Features:  2nd generation system (replaces an earlier system called PORTAL (Jones et al., 1995))     Parallel implementation Web-based documentation and monitoring Incorporates a Fortran-interpreter for input tasks Virtualized I/O subsystem (only memory-resident data structures are needed, data algorithms now function like a model)     Able to failover to redundant hardware Extensible User Module Error Analysis code is still under development Implemented on Windows NT/2000 OS DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 3. What Does it Do?   Global merge capabilities for numerous data sets Current system in operational use for 2+ years at CIRA    Simplifies     Current average operational throughput rates using 15 processors on 8 PCs is 17 TB/yr (47 GB/day). Measured max. throughput rate is: 2.5 PB/yr (7.1 TB/day) Powerful abstraction layers allow anyone to write parallel code Virtual I/O subsystem reduces end-user code complexities Users interact using a language most already know Easily Scales    Limited process “cross-talk” improves scaling behavior Tests have shown that a 2000 machine cluster is physically feasible. Basically… just add hardware. DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 4. 10 Data Types Are Currently Supported  Reads and Writes HDF-EOS natively  GOES IMAGER (McIDAS)  NOAA AVHRR GAC and LAC (McIDAS)  NOAA AMSU-A and B (HDF-EOS)  DMSP SSM/I (Byte Stream)  DMSP SSM/T-2 (NGDC OIS)  DMSP OLS (NGDC OIS)  TRMM TMI and VIRS (HDF)  User extensible… (your format here) DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 5. The Hardware STORAGE VIEW Legend Primary Backup Wn Worker Mirrored Set Primary Backup W1 66 GB 240 GB 240 GB PROCESSOR VIEW W2 240 GB ClusterSummary - All Ingest Processes - Most Higher Level Remapped Products Primary Backup W1 W2 W3 OPERATIONAL CLUSTER (24/7) 9 Processors 3.0 GFlops 2.25 GB RAM ClusterSummary - Large Global Sectors W4 W5 W6 EXPERIMENTAL CLUSTER (nights only/7) DOD Center for Geosciences / Atmospheric Research 6 Processors 2.5 GFlops 2.5 GB RAM Colorado State University
  • 6. Failover Mode STORAGE VIEW X Legend Primary Backup Wn Worker Mirrored Set Primary Backup W1 66 GB 240 GB 240 GB PROCESSOR VIEW W2 240 GB Failover Steps: X Primary Automated 1. Synchronize states 2. Promote the Backup Backup W1 W2 W3 OPERATIONAL CLUSTER (24/7) W4 W5 Restore Steps: Manually initiated 1. Demote the Backup 2. Restore Mirror Set 3. Synchronize states 4. Reactivate Primary W6 EXPERIMENTAL CLUSTER (nights only/7) DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 7. Module Context GUIs Batch Job Client Explorer Command Line Web Browser Command Line Script Command Shell Interpreter DPEAS Input Script Other Applications DPEAS Data Processing Engine Spawn Subtask DPEAS Subtask DPEAS Fortran Interpreter Batch Job Service Analysis Modules DPEAS System State User Modules DPEAS HDF-EOS Virtual I/O Subsystem Translation Modules Output Modules This is DPEAS Internet Information Services Operating System (Windows 2000) DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 8. An example of a DPEAS input script file DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 9. How DPEAS Starts Program Start DPEAS Initialization Interpreting DPEAS script declarations Interpreting DPEAS script executable statements DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 10. How DPEAS Ends Interpreting DPEAS script executable statements DPEAS Summary Program End DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 11. How Are Spawned Input Scripts and Jobs Created?   All spawned DPEAS jobs run machine-generated DPEAS input scripts which are generated by the data processing engine from the Master DPEAS input script (The examples shown previously were examples of DPEAS machine-generated code) This is automated within DPEAS and the user code goes along for the free ride since it is part of the DPEAS executable (it’s like meeting a friendly virus which helps to spread your code along with it) DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 12. What Does DPEAS Parallelism Look Like? Do loop contents are sent to other resources in parallel The new jobs run the same “DPEAS.exe”, but execute only the subtask operations Completed Jobs allow additional jobs to start DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 13. The 3 Programming Steps to Add a User Routine to DPEAS 1. Insert a program “hook” The program hook makes the main DPEAS program aware of the existence of your wrapper routine. 2. Create a wrapper routine The wrapper routine tells the DPEAS fortran interpreter how to parse and interact with your application subroutine arguments. 3. Create an application routine The application routine performs the “real” work. You can do anything you want within the application routine. DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 14. How does the “User_Module.f90” relate to my DPEAS Input Scripts? Compile User_Module.f90 Program Hook Wrapper Routine Application Routine Ordinary Fortran Compiler Interpret Automated Parallelization DPEAS Input Script Using Self-Replication "DPEAS.exe" DPEAS Input Script Subtask Interprets DPEAS Input Script "DPEAS.exe" Interprets DPEAS Input Script Return to Master End DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 15. User Example: The user’s application routine Using the virtual I/O data via pointers 1. Find each MW channel 2. Allocate a new output array data structure Your science code looks like this DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 16. User Example: The results: Complete integration The new user routine is now fully integrated into DPEAS DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 17. User Example: The output HDF-EOS file DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 18. User Example: The output image representation 150 GHz Effective Emissivity Calculated from: GOES-08 IMAGER NOAA-15 AMSU-B DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 19. User Example: Summary  Creates 2 new routines: Wrapper routine  Application routine   Requires 25 lines of executable code: 2 – Program hook Small overhead for gaining massive parallelism capabilities!  4 – Wrapper routine  19 – Application routine     2 – Variable assignments 3 – Science algorithm 14 – Virtual I/O library calls (using only 2 Virtual I/O library routines) DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 20. User Example: How complex would the user routine be, if written without the Virtual I/O library?  Creates 2 new routines:    Wrapper routine Application routine Requires 59 lines of executable code:    2 – Program hook 4 – Wrapper routine 53 – Application routine    2 – Variable assignments 3 – Science algorithm 48 – HDF-EOS library calls (using 26 HDF-EOS library routines) DOD Center for Geosciences / Atmospheric Research Answer: Without the DPEAS Virtual I/O library there would be: 24 additional I/O routines called by the user (+1200%) 34 additional lines of user code (+236%) Colorado State University
  • 21. User Example: Conclusions  Implementation Insights    Virtual I/O Insights   Minimal amount of end-user code is required The effort and resources involved are small (The DPEAS program recompiled in < 30 s on the user’s desktop) The DPEAS virtual I/O access method is less complex than traditional HDF-EOS file access methods End user’s perspective    End users are protected from technical data format issues End users can develop higher quality code by leveraging shared robust common modules Scalability is greatly enhanced with little end user effort DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 22. Summary       DPEAS can process large data sets in an efficient manner while maintaining centralized management controls and error handling behaviors Parallelism of the code is automatic and runs on “cheap hardware” Failover capabilities make the system more robust User code is shielded from complexities of the system using software abstraction layers Little training is needed since user interfaces are in a known scientific language User modules directly access data from memory – obsolesces traditional file access methods but maintains needed file compatibility DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 23. What did I learn about HDF-EOS in the process?    HDF-EOS is an excellent “universal” data format It works for all satellite sensors types I have encountered to date (10+) HDF-EOS requires serious software design before the implementation stage It is my experience that “Time” information as a geo/time field for sectorizing is overrated and is likely to cause future software design headaches with the more complex sensors if encouraged to be the “norm” DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 24. My 2 cents: How HDF-EOS could be made even better (Hopefully someone has already thought of these things, and this short list will be a reaffirmation)  Given that GOES data, for example, and other    multi-detector sensors can have multiple times for each channel for the same geolocation position, and that in addition, they can and do interrupt their sensor scans at any time… Treat “Time” as a data attribute Currently I associate “Time” and other associated arrays with its principle data array by nomenclature It would be better to use data array attribute “groups”. Then “Time”, “Calibration”, and other associated arrays could be grouped with the data array through the data format. DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 25. Why Data Attributes?  Many data channels have “associated” information     For example, it might be very meaningful to associate the min. and max. of a grid location with its mean value It would be better if there was a standard way of showing that group association, so we don’t have to understand each other’s unique nomenclatures, “intent”, or have to resort to the use of unusual “mixed” HDF/HDF-EOS data files Data attributes should not be arbitrarily limited in scope, but have full data type ranges Units could also be incorporated through data attributes DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 26. The End jones@cira.colostate.edu DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 27. Appendix The following series of slides show how a user can easily modify DPEAS 1. The user’s program hook 2. … wrapper routine 3. … application routine (using the virtual I/O data via pointers) 4. 5. Usage of the new user routine in a DPEAS input script file The Results: Complete Integration DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 28. User Example: The user’s program hook 2 lines of code DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 29. User Example: The user’s wrapper routine 4 lines of executable code DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 30. User Example: The user’s application routine Using the virtual I/O data via pointers 1. Find each MW channel 2. Allocate a new output array data structure Your science code looks like this DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 31. User Example: Usage of the new user routine in a DPEAS input script file DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 32. User Example: The results: Complete integration The new user routine is now fully integrated into DPEAS DOD Center for Geosciences / Atmospheric Research Colorado State University
  • 33. Where Do I Find DPEAS? DPEAS Home Page: http://luna.cira.colostate.edu/DPEAS/DPEAS_frame.htm Please direct questions to jones@cira.colostate.edu DOD Center for Geosciences / Atmospheric Research Colorado State University

Notas del editor

  1. DPEAS is one executable that propagates copies of itself within a network cluster of machines in a controlled fashion.