SlideShare una empresa de Scribd logo
1 de 14
V6.0
Getting Started With HDF5
• Why have we brought in a new data format?
• What actually is HDF5?
• How do I create HDF5 files?
• How do I read in HDF5 files
– Reading one file at a time
– Reading multiple files and selections
• Points to Note
• Future Developments
SEGY is great but…
• It is designed to be read sequentially from tape
– and our “index” file solution didn’t scale well to “big data”
– and our index file solution only allowed primary key access
• It only has 240 bytes of 32-bit integer headers defined
– and our extended trace headers didn’t scale well to “big data”
• Some processes require “n-key random access”
– “surface consistent” suite, PreSTM, 3DSRME etc.
• You need to read the whole file to access trace headers
– Some “database” systems offer more flexibility
• Parallel I/O doesn’t scale well on large clusters
So what is HDF5?
• Developed over the last 20 years
• Initially by National Centre for Supercomputing Applications http://www.ncsa.illinois.edu/
• Now developed by the HDF5 Group http//:www.hdfgroup.org
• A suite of technologies, not just a file format
• General purpose library and file format for storing scientific data
• Fully supported set of command line tools, APIs and interfaces
• A pan-industry open standard
• Used for storage by both MatLab and Scilab, can be read by Mathmatica
• Fully supported set of command line tools, APIs and interfaces
• A self describing format
• No ambiguity about integer or floating point types or storage in trace bytes
• Names can be allocated to components, as you would in a database structure
• Built for “big data”
• Petabyte+ scale datasets running on tens of thousands of cores
Our Implementation of HDF5
HDFView 2.9 : free, third party
tool, showing how any HDF5
application can open the new
format
Data, Processing History, 400-byte
reel header, 3200-byte text
header, history and trace headers
from Claritas extended SEGY all
present
Seismic samples displayed
graphically – could also be
displayed as a table
All trace headers – SEGY 240byte
and extended - opened in a
spreadsheet; full mathematical
operations
We have “encapsulated” the GLOBE Claritas SEGY in HDF5
The 400-byte binary reel header
opened as a table, so that values
can be edited or modified
Creating HDF5 Files : SEISWRITE
Specify a file name!
Optimisation controls; these have smart defaults set and
can be modified for managing very large datasets where
you know that non-sequential read-access will be
needed, or partial read of trace samples will be required
Replaces current use of DISCWRITE, although this will continue to be available
New functionality development will focus on SEISWRITE and HDF5 format data
Reading HDF5 files : SEISREAD
With HDF5 format, you use SEISREAD in place of the DISCxxxxx Modules
You don’t need to worry about the order of data on disc, just how you want to read it
Simple Reading
File Name
Primary key order;
default is
all, ascending
Secondary key order;
default is
all, ascending
Tertiary key order; only
when needed
You can read data in ANY order;
original order doesn’t matter
Selection and Repeats
6 Repeat copies specified
Primary key SHOTID with only
SHOTID 900 only selected; note
tolerance
Secondary key CHANNEL, all
selected, in ascending order (default)
Six copies of SHOTID 900 passed to the
processing flow, with REPEAT set from 1-6
More Complex Selections
Two copies of SHOTIDs from 100 to 900 with
an increment of 100, all channels in
ascending, with REPEAT set to 1 and 2
More complex SHOTID selection using
the same syntax as DISCREAD; note
tolerance is set to 0
Sorting to CDP (DISCGATH)
Identical to simple reading
Specify CDP and primary key
Specify CDPTRACE as secondary key
Default is to read all data in ascending
primary/secondary key order
Reading Multiple Files
Seismic File List used in the same
format as with DISCREAD, with
selections
SETRAEPEAT parameter used as per
DISCREAD to create panels, files are
merged if this is “no”
Primary Key defined here is used in the
Seismic File List definition
This last file has a “native”
ordering of
CDP, CDPTRACE, but will be
order to SHOT, CHANNEL on
read, automatically
Points to Note
• Can only specify a primary key in a Seismic File List
– Same as DISCWRITE, although the original data order no longer matters
• User needs to managed extended trace headers merge
– Use DELHDR prior to merging files; will be removed in future releases
• Files can be 10-15% larger than SEGY
• Compatible with Cluster File Systems (Gluster etc.)
• I/O above about 2Gbytes should be improved
Future development
• Improved PKEY/SKEY/TKEY selection handling
• Direct update of trace headers from applications
– Geometry, SV (FB picks) etc.
• Add HDF5 support in KPRET2D
– Only module where this is not available
• Add full parallel I/O to iMage suite
– Increase parallel scalability even further
• Algorithmic optimisation
– Re-write to take full advantage of random access

Más contenido relacionado

La actualidad más candente

Interpretation 23.12.13
Interpretation 23.12.13Interpretation 23.12.13
Interpretation 23.12.13
Shashwat Sinha
 
Reservoir Geophysics : Brian Russell Lecture 2
Reservoir Geophysics : Brian Russell Lecture 2Reservoir Geophysics : Brian Russell Lecture 2
Reservoir Geophysics : Brian Russell Lecture 2
Ali Osman Öncel
 
Using 3-D Seismic Attributes in Reservoir Characterization
Using 3-D Seismic Attributes in Reservoir CharacterizationUsing 3-D Seismic Attributes in Reservoir Characterization
Using 3-D Seismic Attributes in Reservoir Characterization
guest05b785
 
Quantitative and Qualitative Seismic Interpretation of Seismic Data
Quantitative and Qualitative Seismic Interpretation of Seismic Data Quantitative and Qualitative Seismic Interpretation of Seismic Data
Quantitative and Qualitative Seismic Interpretation of Seismic Data
Haseeb Ahmed
 

La actualidad más candente (20)

Interpretation 23.12.13
Interpretation 23.12.13Interpretation 23.12.13
Interpretation 23.12.13
 
WesternGeco presentation - Seismic Data Processing
WesternGeco presentation - Seismic Data ProcessingWesternGeco presentation - Seismic Data Processing
WesternGeco presentation - Seismic Data Processing
 
Seismic Data Processing, Ahmed Osama
Seismic Data Processing, Ahmed OsamaSeismic Data Processing, Ahmed Osama
Seismic Data Processing, Ahmed Osama
 
Reservoir Geophysics : Brian Russell Lecture 2
Reservoir Geophysics : Brian Russell Lecture 2Reservoir Geophysics : Brian Russell Lecture 2
Reservoir Geophysics : Brian Russell Lecture 2
 
Survey design
Survey designSurvey design
Survey design
 
Seismic data processing 15, kirchhof migration
Seismic data processing 15, kirchhof migrationSeismic data processing 15, kirchhof migration
Seismic data processing 15, kirchhof migration
 
Seismic geometric corrections
Seismic geometric correctionsSeismic geometric corrections
Seismic geometric corrections
 
Direct hydrocarbon indicators (DHI)
Direct hydrocarbon indicators (DHI)Direct hydrocarbon indicators (DHI)
Direct hydrocarbon indicators (DHI)
 
Petrel course
Petrel coursePetrel course
Petrel course
 
Volume calculation
Volume calculationVolume calculation
Volume calculation
 
Using 3-D Seismic Attributes in Reservoir Characterization
Using 3-D Seismic Attributes in Reservoir CharacterizationUsing 3-D Seismic Attributes in Reservoir Characterization
Using 3-D Seismic Attributes in Reservoir Characterization
 
Well log data processing
Well log data processingWell log data processing
Well log data processing
 
Gama ray log
Gama ray logGama ray log
Gama ray log
 
Seismic data processing 13 stacking&migration
Seismic data processing 13 stacking&migrationSeismic data processing 13 stacking&migration
Seismic data processing 13 stacking&migration
 
Seismic Migration
Seismic MigrationSeismic Migration
Seismic Migration
 
Playtypes, Presentation made in London-2008
Playtypes, Presentation made in London-2008Playtypes, Presentation made in London-2008
Playtypes, Presentation made in London-2008
 
Well Log Interpretation
Well Log InterpretationWell Log Interpretation
Well Log Interpretation
 
Quantitative and Qualitative Seismic Interpretation of Seismic Data
Quantitative and Qualitative Seismic Interpretation of Seismic Data Quantitative and Qualitative Seismic Interpretation of Seismic Data
Quantitative and Qualitative Seismic Interpretation of Seismic Data
 
Petrophysics More Important Than Ever
Petrophysics   More Important Than EverPetrophysics   More Important Than Ever
Petrophysics More Important Than Ever
 
Petrel F 5 horizon interpretation 2018 v1.0
Petrel F 5 horizon interpretation 2018 v1.0Petrel F 5 horizon interpretation 2018 v1.0
Petrel F 5 horizon interpretation 2018 v1.0
 

Similar a A quick start guide to using HDF5 files in GLOBE Claritas

9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf
Manoel Ribeiro
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
saili mane
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
ManiMaran230751
 

Similar a A quick start guide to using HDF5 files in GLOBE Claritas (20)

9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Chapter2.pdf
Chapter2.pdfChapter2.pdf
Chapter2.pdf
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
 
AHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File SystemsAHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File Systems
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Unit-3.pptx
Unit-3.pptxUnit-3.pptx
Unit-3.pptx
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptx
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Hadoop File System.pptx
Hadoop File System.pptxHadoop File System.pptx
Hadoop File System.pptx
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 

Más de Guy Maslen (8)

Human error, brains and how agility helps
Human error, brains and how agility helpsHuman error, brains and how agility helps
Human error, brains and how agility helps
 
GLOBE Claritas V6.6 at a glance
GLOBE Claritas V6.6 at a glanceGLOBE Claritas V6.6 at a glance
GLOBE Claritas V6.6 at a glance
 
Globe Claritas v6.5 at a glance
Globe Claritas v6.5 at a glanceGlobe Claritas v6.5 at a glance
Globe Claritas v6.5 at a glance
 
Globe claritas v6.5 at a glance
Globe claritas v6.5 at a glanceGlobe claritas v6.5 at a glance
Globe claritas v6.5 at a glance
 
Exploring Bad Deconvolution Design - some examples
Exploring Bad Deconvolution Design - some examplesExploring Bad Deconvolution Design - some examples
Exploring Bad Deconvolution Design - some examples
 
GLOBE Claritas v6.2 at a Glance
GLOBE Claritas v6.2 at a GlanceGLOBE Claritas v6.2 at a Glance
GLOBE Claritas v6.2 at a Glance
 
Demultiple Routes
Demultiple RoutesDemultiple Routes
Demultiple Routes
 
GLOBE Claritas 2011-12
GLOBE Claritas 2011-12GLOBE Claritas 2011-12
GLOBE Claritas 2011-12
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

A quick start guide to using HDF5 files in GLOBE Claritas

  • 2. Getting Started With HDF5 • Why have we brought in a new data format? • What actually is HDF5? • How do I create HDF5 files? • How do I read in HDF5 files – Reading one file at a time – Reading multiple files and selections • Points to Note • Future Developments
  • 3. SEGY is great but… • It is designed to be read sequentially from tape – and our “index” file solution didn’t scale well to “big data” – and our index file solution only allowed primary key access • It only has 240 bytes of 32-bit integer headers defined – and our extended trace headers didn’t scale well to “big data” • Some processes require “n-key random access” – “surface consistent” suite, PreSTM, 3DSRME etc. • You need to read the whole file to access trace headers – Some “database” systems offer more flexibility • Parallel I/O doesn’t scale well on large clusters
  • 4. So what is HDF5? • Developed over the last 20 years • Initially by National Centre for Supercomputing Applications http://www.ncsa.illinois.edu/ • Now developed by the HDF5 Group http//:www.hdfgroup.org • A suite of technologies, not just a file format • General purpose library and file format for storing scientific data • Fully supported set of command line tools, APIs and interfaces • A pan-industry open standard • Used for storage by both MatLab and Scilab, can be read by Mathmatica • Fully supported set of command line tools, APIs and interfaces • A self describing format • No ambiguity about integer or floating point types or storage in trace bytes • Names can be allocated to components, as you would in a database structure • Built for “big data” • Petabyte+ scale datasets running on tens of thousands of cores
  • 5. Our Implementation of HDF5 HDFView 2.9 : free, third party tool, showing how any HDF5 application can open the new format Data, Processing History, 400-byte reel header, 3200-byte text header, history and trace headers from Claritas extended SEGY all present Seismic samples displayed graphically – could also be displayed as a table All trace headers – SEGY 240byte and extended - opened in a spreadsheet; full mathematical operations We have “encapsulated” the GLOBE Claritas SEGY in HDF5 The 400-byte binary reel header opened as a table, so that values can be edited or modified
  • 6. Creating HDF5 Files : SEISWRITE Specify a file name! Optimisation controls; these have smart defaults set and can be modified for managing very large datasets where you know that non-sequential read-access will be needed, or partial read of trace samples will be required Replaces current use of DISCWRITE, although this will continue to be available New functionality development will focus on SEISWRITE and HDF5 format data
  • 7. Reading HDF5 files : SEISREAD With HDF5 format, you use SEISREAD in place of the DISCxxxxx Modules You don’t need to worry about the order of data on disc, just how you want to read it
  • 8. Simple Reading File Name Primary key order; default is all, ascending Secondary key order; default is all, ascending Tertiary key order; only when needed You can read data in ANY order; original order doesn’t matter
  • 9. Selection and Repeats 6 Repeat copies specified Primary key SHOTID with only SHOTID 900 only selected; note tolerance Secondary key CHANNEL, all selected, in ascending order (default) Six copies of SHOTID 900 passed to the processing flow, with REPEAT set from 1-6
  • 10. More Complex Selections Two copies of SHOTIDs from 100 to 900 with an increment of 100, all channels in ascending, with REPEAT set to 1 and 2 More complex SHOTID selection using the same syntax as DISCREAD; note tolerance is set to 0
  • 11. Sorting to CDP (DISCGATH) Identical to simple reading Specify CDP and primary key Specify CDPTRACE as secondary key Default is to read all data in ascending primary/secondary key order
  • 12. Reading Multiple Files Seismic File List used in the same format as with DISCREAD, with selections SETRAEPEAT parameter used as per DISCREAD to create panels, files are merged if this is “no” Primary Key defined here is used in the Seismic File List definition This last file has a “native” ordering of CDP, CDPTRACE, but will be order to SHOT, CHANNEL on read, automatically
  • 13. Points to Note • Can only specify a primary key in a Seismic File List – Same as DISCWRITE, although the original data order no longer matters • User needs to managed extended trace headers merge – Use DELHDR prior to merging files; will be removed in future releases • Files can be 10-15% larger than SEGY • Compatible with Cluster File Systems (Gluster etc.) • I/O above about 2Gbytes should be improved
  • 14. Future development • Improved PKEY/SKEY/TKEY selection handling • Direct update of trace headers from applications – Geometry, SV (FB picks) etc. • Add HDF5 support in KPRET2D – Only module where this is not available • Add full parallel I/O to iMage suite – Increase parallel scalability even further • Algorithmic optimisation – Re-write to take full advantage of random access