SlideShare a Scribd company logo
1 of 77
Download to read offline
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

P Y TABLES & Family
Analyzing and Sharing HDF5 Data with Python

Francesc Altet
Cárabos Coop. V.

HDF Workshop November 30, 2005 - December 2, 2005.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Who are we?
Cárabos is the company committed to the P Y TABLES suite
development and deployment.
We have years of experience in designing software
solutions for handling extremely large datasets.
What we provide:
Commercial support for the P Y TABLES suite.
P Y TABLES-based applications.
Consulting services for managing complex data
environments.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Who are we?
Cárabos is the company committed to the P Y TABLES suite
development and deployment.
We have years of experience in designing software
solutions for handling extremely large datasets.
What we provide:
Commercial support for the P Y TABLES suite.
P Y TABLES-based applications.
Consulting services for managing complex data
environments.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Who are we?
Cárabos is the company committed to the P Y TABLES suite
development and deployment.
We have years of experience in designing software
solutions for handling extremely large datasets.
What we provide:
Commercial support for the P Y TABLES suite.
P Y TABLES-based applications.
Consulting services for managing complex data
environments.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Outline
1

An Introduction to the Python Language
Facts about Python
Scientific Packages
An example

2

P Y TABLES
Overview
Usage Examples
P Y TABLES Pro

3

CSTABLES
Overview
Design goals
Examples of Use
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Outline
1

An Introduction to the Python Language
Facts about Python
Scientific Packages
An example

2

P Y TABLES
Overview
Usage Examples
P Y TABLES Pro

3

CSTABLES
Overview
Design goals
Examples of Use
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Main Features

Interpreted ⇒ Allows interactivity
Flexible data structures ⇒ Malleability
Minimalistic grammar ⇒ Easy to learn
Very complete library (Batteries included) ⇒ Boosts
productivity

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Main Features

Interpreted ⇒ Allows interactivity
Flexible data structures ⇒ Malleability
Minimalistic grammar ⇒ Easy to learn
Very complete library (Batteries included) ⇒ Boosts
productivity

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Main Features

Interpreted ⇒ Allows interactivity
Flexible data structures ⇒ Malleability
Minimalistic grammar ⇒ Easy to learn
Very complete library (Batteries included) ⇒ Boosts
productivity

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Main Features

Interpreted ⇒ Allows interactivity
Flexible data structures ⇒ Malleability
Minimalistic grammar ⇒ Easy to learn
Very complete library (Batteries included) ⇒ Boosts
productivity

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Very Suitable for Scientific/Engineer Fields
Interactivity has always been very appreciated for
increasing productivity.
Being interpreted does not necessarily mean being
inefficient. Python has solutions for linking C & Fortran
libraries in an easy way.
Programming is normally considered a necessary evil.
Rich expressivity of Python normally reduces the amount
of code to solve real problems.
Many efficient scientific libraries have been developed for
Python: Numeric, numarray, SciPy, Scientific-Python,
matplotlib...

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Very Suitable for Scientific/Engineer Fields
Interactivity has always been very appreciated for
increasing productivity.
Being interpreted does not necessarily mean being
inefficient. Python has solutions for linking C & Fortran
libraries in an easy way.
Programming is normally considered a necessary evil.
Rich expressivity of Python normally reduces the amount
of code to solve real problems.
Many efficient scientific libraries have been developed for
Python: Numeric, numarray, SciPy, Scientific-Python,
matplotlib...

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Very Suitable for Scientific/Engineer Fields
Interactivity has always been very appreciated for
increasing productivity.
Being interpreted does not necessarily mean being
inefficient. Python has solutions for linking C & Fortran
libraries in an easy way.
Programming is normally considered a necessary evil.
Rich expressivity of Python normally reduces the amount
of code to solve real problems.
Many efficient scientific libraries have been developed for
Python: Numeric, numarray, SciPy, Scientific-Python,
matplotlib...

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Very Suitable for Scientific/Engineer Fields
Interactivity has always been very appreciated for
increasing productivity.
Being interpreted does not necessarily mean being
inefficient. Python has solutions for linking C & Fortran
libraries in an easy way.
Programming is normally considered a necessary evil.
Rich expressivity of Python normally reduces the amount
of code to solve real problems.
Many efficient scientific libraries have been developed for
Python: Numeric, numarray, SciPy, Scientific-Python,
matplotlib...

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Outline
1

An Introduction to the Python Language
Facts about Python
Scientific Packages
An example

2

P Y TABLES
Overview
Usage Examples
P Y TABLES Pro

3

CSTABLES
Overview
Design goals
Examples of Use
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Basic Matrix Handling
Numeric
Very mature but lacking some features.
Still has a huge user base.
scipy.core
Called to substitute both Numeric & numarray. When
finished, it is supposed to have all the advantages of
Numeric & numarray together.
numarray
Created to overcome some of the limitations of Numeric.
Better handling of large arrays, heterogeneous data...
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Basic Matrix Handling
Numeric
Very mature but lacking some features.
Still has a huge user base.
scipy.core
Called to substitute both Numeric & numarray. When
finished, it is supposed to have all the advantages of
Numeric & numarray together.
numarray
Created to overcome some of the limitations of Numeric.
Better handling of large arrays, heterogeneous data...
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Basic Matrix Handling
Numeric
Very mature but lacking some features.
Still has a huge user base.
scipy.core
Called to substitute both Numeric & numarray. When
finished, it is supposed to have all the advantages of
Numeric & numarray together.
numarray
Created to overcome some of the limitations of Numeric.
Better handling of large arrays, heterogeneous data...
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Numerical Supplements
SciPy
Optimization, integration, special functions
Signal and image processing
Genetic algorithms, ODE solvers...
Scientific Python
Quaternions, automatic derivatives, (linear) interpolation,
polinomials, ...
Overlaps somewhat SciPy, but it has...
A nice netCDF module for I/O

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Numerical Supplements
SciPy
Optimization, integration, special functions
Signal and image processing
Genetic algorithms, ODE solvers...
Scientific Python
Quaternions, automatic derivatives, (linear) interpolation,
polinomials, ...
Overlaps somewhat SciPy, but it has...
A nice netCDF module for I/O

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Plotting (2D)

matplotlib example
matplotlib
Great interactivity.
Produces
publication quality
figures.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Plotting (2D)

PyQwt example
PyQwt
Fast plotting.
Nice to be
embedded in Qt
apps.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Plotting (3D)

MayaVi example
MayaVi
3D-oriented
scientific data
visualizer.
Uses the
Visualization
Toolkit (VTK).

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Outline
1

An Introduction to the Python Language
Facts about Python
Scientific Packages
An example

2

P Y TABLES
Overview
Usage Examples
P Y TABLES Pro

3

CSTABLES
Overview
Design goals
Examples of Use
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Programming as a Necessary Evil
Evaluate En (x) =

∞ e−xt
1
t n dt

for each value of n

from scipy import *
from scipy.integrate import quad, Inf
def integrand(t,n,x):
return exp(-x*t) / t**n
def expint(n,x):
return quad(integrand, 1, Inf, args=(n, x))[0]
v_expint = vectorize(expint)
print v_expint(3,arange(1.0,4.0,0.5))

The output
[ 0.10969197, 0.05673949, 0.03013338, 0.01629537,
0.00893065, 0.00494538,]

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Programming as a Necessary Evil

The output
The plotting code (matplotlib)
xrng = arange(1.0,4.0,0.1)
plot(xrng, v_expint(1, xrng))
plot(xrng, v_expint(2, xrng))
plot(xrng, v_expint(3, xrng))
legend(("n=1", "n=2", "n=3"))
title(’Exponential Integral’)

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Facts about Python
Scientific Packages
An example

Resources
Introductory material:
docs.python.org/tut/tut.html
www.python.org/doc/Intros.html

Lutz & Ascher, "Learning Python": good introduction.
Martelli, "Python in a Nutshell": useful reference.
Martelli & Ascher, "Python Cookbook": more specialized,
useful recipes for particular problems.
Hans P. Langtangen, "Python Scripting for Computational
Science": teaches computational scientists and engineers
how to write small Python scripts efficiently.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Outline
1

An Introduction to the Python Language
Facts about Python
Scientific Packages
An example

2

P Y TABLES
Overview
Usage Examples
P Y TABLES Pro

3

CSTABLES
Overview
Design goals
Examples of Use
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

What is P Y TABLES?

Simply stated: a database for Python based on HDF5.
Designed to deal with extremely large datasets.
Provides an easy-to-use interface.
Supports many of the features of HDF5 and others that are
not present in it (e.g. indexation).
It is Open Source software (BSD license).

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Why HDF5?
I started looking for different backends for saving large
datasets, but HDF5 was the final winner.
Thought out for managing very large datasets in an
efficient way.
Let you organize datasets hierchically.
Very flexible and well tested in scientific environments.
Good maintenance and improvement rate.
It is Open Source software.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Why HDF5?
I started looking for different backends for saving large
datasets, but HDF5 was the final winner.
Thought out for managing very large datasets in an
efficient way.
Let you organize datasets hierchically.
Very flexible and well tested in scientific environments.
Good maintenance and improvement rate.
It is Open Source software.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

What Does Extremely Large Exactly Mean?
The Hitch Hiker’s Guide to the Galaxy offers this definition of
the word “Infinite”:
Infinite: Bigger than the biggest thing ever and then some.
Much bigger than that in fact, really amazingly immense, a
totally stunning size, real “wow, that’s big”. Infinity is just so big
that, by comparison, bigness looks really titchy. Gigantic
multiplied by colossal multiplied by staggeringly huge is the sort
of concept we’re trying to get across here.
Disclaimer
Agreed, ’Extremely Large’ may not exactly mean ’Infinite’,
although it is pretty close to this definition.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

What Does Extremely Large Exactly Mean?
The Hitch Hiker’s Guide to the Galaxy offers this definition of
the word “Infinite”:
Infinite: Bigger than the biggest thing ever and then some.
Much bigger than that in fact, really amazingly immense, a
totally stunning size, real “wow, that’s big”. Infinity is just so big
that, by comparison, bigness looks really titchy. Gigantic
multiplied by colossal multiplied by staggeringly huge is the sort
of concept we’re trying to get across here.
Disclaimer
Agreed, ’Extremely Large’ may not exactly mean ’Infinite’,
although it is pretty close to this definition.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

What Does Extremely Large Exactly Mean?
The Hitch Hiker’s Guide to the Galaxy offers this definition of
the word “Infinite”:
Infinite: Bigger than the biggest thing ever and then some.
Much bigger than that in fact, really amazingly immense, a
totally stunning size, real “wow, that’s big”. Infinity is just so big
that, by comparison, bigness looks really titchy. Gigantic
multiplied by colossal multiplied by staggeringly huge is the sort
of concept we’re trying to get across here.
Disclaimer
Agreed, ’Extremely Large’ may not exactly mean ’Infinite’,
although it is pretty close to this definition.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Which ELD Features Does P Y TABLES Support?
A good base library (HDF5).
Support for 64-bit in all data addressing.
Need to overcome a Python slicing limitation: only 32-bit
addresses are supported.

Buffered I/O.
A LRU cache system for an efficient reuse of objects.
Fast indexing and searching capabilities for tables.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Which ELD Features Does P Y TABLES Support?
A good base library (HDF5).
Support for 64-bit in all data addressing.
Need to overcome a Python slicing limitation: only 32-bit
addresses are supported.

Buffered I/O.
A LRU cache system for an efficient reuse of objects.
Fast indexing and searching capabilities for tables.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Which ELD Features Does P Y TABLES Support?
A good base library (HDF5).
Support for 64-bit in all data addressing.
Need to overcome a Python slicing limitation: only 32-bit
addresses are supported.

Buffered I/O.
A LRU cache system for an efficient reuse of objects.
Fast indexing and searching capabilities for tables.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Which ELD Features Does P Y TABLES Support?
A good base library (HDF5).
Support for 64-bit in all data addressing.
Need to overcome a Python slicing limitation: only 32-bit
addresses are supported.

Buffered I/O.
A LRU cache system for an efficient reuse of objects.
Fast indexing and searching capabilities for tables.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Which ELD Features Does P Y TABLES Support?
A good base library (HDF5).
Support for 64-bit in all data addressing.
Need to overcome a Python slicing limitation: only 32-bit
addresses are supported.

Buffered I/O.
A LRU cache system for an efficient reuse of objects.
Fast indexing and searching capabilities for tables.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Why Buffered I/O?
Because of speed (what else?):

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Why a Cache System?
1.- To achieve better open file times:

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Why a Cache System?
2.- To achieve a conservative usage of the memory:

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Why Fast Indexing?
It is not trifling in terms of time when you have to index tables
with more than one billion of rows:

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Ease of Use
Natural naming
# access to file:/group1/table
table = file.root.group1.table

Support for generalized slicing
# step means a stride in the slice
table[start:stop:step]

Support for iterators
# get the values in col1 that satisfy the
# (1.3 < col3 <= 2.) condition in table
col3 = table.cols.col3
[r[’col1’] for r in table.where(1.3 < col3 <= 2.)]
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Ease of Use
Natural naming
# access to file:/group1/table
table = file.root.group1.table

Support for generalized slicing
# step means a stride in the slice
table[start:stop:step]

Support for iterators
# get the values in col1 that satisfy the
# (1.3 < col3 <= 2.) condition in table
col3 = table.cols.col3
[r[’col1’] for r in table.where(1.3 < col3 <= 2.)]
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Ease of Use
Natural naming
# access to file:/group1/table
table = file.root.group1.table

Support for generalized slicing
# step means a stride in the slice
table[start:stop:step]

Support for iterators
# get the values in col1 that satisfy the
# (1.3 < col3 <= 2.) condition in table
col3 = table.cols.col3
[r[’col1’] for r in table.where(1.3 < col3 <= 2.)]
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Outline
1

An Introduction to the Python Language
Facts about Python
Scientific Packages
An example

2

P Y TABLES
Overview
Usage Examples
P Y TABLES Pro

3

CSTABLES
Overview
Design goals
Examples of Use
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Supported Objects in P Y TABLES
Table: Lets you deal with heterogeneous datasets.
Chunked. Enlargeable. Support for nested types.
Array: Provides quick and dirty array handling. Not
chunked. Non enlargeable.
CArray: Provides compressed array support. Chunked.
Not enlargeable.
EArray: Most general array support. Chunked.
Enlargeable.
VLArray: Collections of homogeneous data with a variable
number of entries. Chunked. Enlargeable.
Group: The structural component.
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

What’s New in P Y TABLES 1.2?
Added a new cache for the object tree.
Allows a contained use of memory (even with huge trees).
Almost instantaneous opening of files (good news for
interactive use!).

New NetCDF module (contributed by Jeff Whitaker).
Provides API emulation for Scientific.IO.NetCDF.
netCDF-3 ⇔ HDF5 conversions.
tables.NetCDF datasets can be shared over the internet
with the OPeNDAP protocol.
Plans to write data natively in netCDF-4 format in the future.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

What’s New in P Y TABLES 1.2?
Added a new cache for the object tree.
Allows a contained use of memory (even with huge trees).
Almost instantaneous opening of files (good news for
interactive use!).

New NetCDF module (contributed by Jeff Whitaker).
Provides API emulation for Scientific.IO.NetCDF.
netCDF-3 ⇔ HDF5 conversions.
tables.NetCDF datasets can be shared over the internet
with the OPeNDAP protocol.
Plans to write data natively in netCDF-4 format in the future.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Some Metrics on P Y TABLES 1.2
Core Library
~ 7.5 thousand LOC (Python)
~ 2.8 thousand LOC (Pyrex)
~ 4.0 thousand LOC (C)

Test Units
~ 3.1 thousand test units (Python)
~ 23 thousand LOC (Python)

Documentation
~ 5.0 thousand lines of on-line doc strings
~ 160 pages of Users’ Guide in PDF (and HTML)

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Outline
1

An Introduction to the Python Language
Facts about Python
Scientific Packages
An example

2

P Y TABLES
Overview
Usage Examples
P Y TABLES Pro

3

CSTABLES
Overview
Design goals
Examples of Use
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

What is P Y TABLES P RO?
It’s just a regular P Y TABLES but with enhanced features:
Improved search speed: Selections in tables with > 1
billion rows will be typically done in less than 1 second.
Complex queries: Supports an unlimited combination of
conditions.
Query optimizer: Queries are analyzed, reordered and
classified to get an optimum response time.
Support for complex indices: Expressions like col1 +
col2*col3 + col4**3 can be indexed and used for selections
afterwards.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Index Selection Speed

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Index Selection Speed: The Goal

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Usage Examples
P Y TABLES Pro

Current Status for P Y TABLES P RO

Improving the query response time.
Remains to be done:
Complex queries.
Query optimizer.

Date of release: 2nd quarter 2006 (tentative).

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

Outline
1

An Introduction to the Python Language
Facts about Python
Scientific Packages
An example

2

P Y TABLES
Overview
Usage Examples
P Y TABLES Pro

3

CSTABLES
Overview
Design goals
Examples of Use
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

CSTABLES: P Y TABLES Goes Client-Server

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

Outline
1

An Introduction to the Python Language
Facts about Python
Scientific Packages
An example

2

P Y TABLES
Overview
Usage Examples
P Y TABLES Pro

3

CSTABLES
Overview
Design goals
Examples of Use
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

Design Goals
Concurrency: Allows multiple access to the same file at the
same time.
Locking system: Avoids corruption of data.

High Throughput: Data is transmitted in large blocks.
High-speed compressor/decompressor used.
Client Cache: Metadata is kept in the client-side when an
object is first accessed.
Full P Y TABLES API compatibility: Existing P Y TABLES
programs can be re-used to access remote files
without changing virtually anything.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

Concurrency Issues
Question: CSTABLES does not provide threading or
asynchronous features yet. So, how does it deal with
several requests at a time?
Answer: Large data read and write requests are split into
small chunks. This considerably improves server response
time.
In addition, CSTABLES provides a lock mechanism that
allows applications to explicitly put a lock on a node or on
an entire subtree.
Different locking access modes:
READ, WRITE and ALL
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

Concurrency Issues
Question: CSTABLES does not provide threading or
asynchronous features yet. So, how does it deal with
several requests at a time?
Answer: Large data read and write requests are split into
small chunks. This considerably improves server response
time.
In addition, CSTABLES provides a lock mechanism that
allows applications to explicitly put a lock on a node or on
an entire subtree.
Different locking access modes:
READ, WRITE and ALL
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

Concurrency Issues
Question: CSTABLES does not provide threading or
asynchronous features yet. So, how does it deal with
several requests at a time?
Answer: Large data read and write requests are split into
small chunks. This considerably improves server response
time.
In addition, CSTABLES provides a lock mechanism that
allows applications to explicitly put a lock on a node or on
an entire subtree.
Different locking access modes:
READ, WRITE and ALL
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

Client Cache More in Depth
CSTABLES caches some of the metadata of the P Y TABLES
object tree.
When a client makes a change to the metadata, this
change is pushed to the other clients’ caches.
Easy to control which attributes should be cached and
which should not.
You should try to cache primarily read-only attributes.

Caveat emptor:
Caching attributes that are updated very often might generate
more traffic than attributes that are not cached at all!
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

Client Cache More in Depth
CSTABLES caches some of the metadata of the P Y TABLES
object tree.
When a client makes a change to the metadata, this
change is pushed to the other clients’ caches.
Easy to control which attributes should be cached and
which should not.
You should try to cache primarily read-only attributes.

Caveat emptor:
Caching attributes that are updated very often might generate
more traffic than attributes that are not cached at all!
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

Client Cache More in Depth
CSTABLES caches some of the metadata of the P Y TABLES
object tree.
When a client makes a change to the metadata, this
change is pushed to the other clients’ caches.
Easy to control which attributes should be cached and
which should not.
You should try to cache primarily read-only attributes.

Caveat emptor:
Caching attributes that are updated very often might generate
more traffic than attributes that are not cached at all!
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

Client Cache More in Depth
CSTABLES caches some of the metadata of the P Y TABLES
object tree.
When a client makes a change to the metadata, this
change is pushed to the other clients’ caches.
Easy to control which attributes should be cached and
which should not.
You should try to cache primarily read-only attributes.

Caveat emptor:
Caching attributes that are updated very often might generate
more traffic than attributes that are not cached at all!
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

Outline
1

An Introduction to the Python Language
Facts about Python
Scientific Packages
An example

2

P Y TABLES
Overview
Usage Examples
P Y TABLES Pro

3

CSTABLES
Overview
Design goals
Examples of Use
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

Two Different APIs
P Y TABLES API (prefix with csclient -ip=server_IP)
import tables
fileh = tables.openFile("file.h5")
print fileh.root.table.cols.col1[:]
fileh.close()
CSTABLES API (no need to be prefixed)
from cstables.client import client
c=client.Session()
app=c.connect("your_FQDN_server")
fileh=app.openFile("file.h5")
print fileh.root.table.cols.col1[:]
fileh.close()
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

Two Different APIs
P Y TABLES API (prefix with csclient -ip=server_IP)
import tables
fileh = tables.openFile("file.h5")
print fileh.root.table.cols.col1[:]
fileh.close()
CSTABLES API (no need to be prefixed)
from cstables.client import client
c=client.Session()
app=c.connect("your_FQDN_server")
fileh=app.openFile("file.h5")
print fileh.root.table.cols.col1[:]
fileh.close()
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

CSTABLES Status & Availability

The main design features are already implemented and
working.
Beta available (caveat: only works against P Y TABLES 1.0).
Focus now is on checking & debugging possible errors,
improving the throughput and bettering the Users’ Guide.
Future directions: threading, asynchronous
communications.

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

The P Y TABLES Suite Is Getting Shape
P Y TABLES
The basic layer for the other components.
P Y TABLES Pro
P Y TABLES with a twist: Complex searches and ultra-fast
selections in tables.
CSTABLES
The client-server P Y TABLES. It supports as well generic HDF5
files.
V I TABLES
A data viewer for P Y TABLES (and HDF5) files.
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

The P Y TABLES Suite Is Getting Shape
P Y TABLES
The basic layer for the other components.
P Y TABLES Pro
P Y TABLES with a twist: Complex searches and ultra-fast
selections in tables.
CSTABLES
The client-server P Y TABLES. It supports as well generic HDF5
files.
V I TABLES
A data viewer for P Y TABLES (and HDF5) files.
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

The P Y TABLES Suite Is Getting Shape
P Y TABLES
The basic layer for the other components.
P Y TABLES Pro
P Y TABLES with a twist: Complex searches and ultra-fast
selections in tables.
CSTABLES
The client-server P Y TABLES. It supports as well generic HDF5
files.
V I TABLES
A data viewer for P Y TABLES (and HDF5) files.
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Overview
Design goals
Examples of Use

The P Y TABLES Suite Is Getting Shape
P Y TABLES
The basic layer for the other components.
P Y TABLES Pro
P Y TABLES with a twist: Complex searches and ultra-fast
selections in tables.
CSTABLES
The client-server P Y TABLES. It supports as well generic HDF5
files.
V I TABLES
A data viewer for P Y TABLES (and HDF5) files.
Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Summary
Python is a joy for scientific computing. Start using it now!
The P Y TABLES suite is designed to work with HDF5 files in
an interactive, efficient and, most importantly, easy way.
Outlook
Working hard to release CSTABLES in the first quarter of
2006. P Y TABLES P RO will come later on (~ 2nd quarter of
2006).
It would be nice to produce a parallel version of P Y TABLES
(long term goal).

Francesc Altet

P Y TABLES & Family
An Introduction to the Python Language
P Y TABLES
CSTABLES
Summary

Thank You!

Thanks also to:
Elena Pourmal, for inviting us to the Workshop.
Frank Baker, for being kind.
The HDF Group for pushing forward P Y TABLES.

Francesc Altet

P Y TABLES & Family

More Related Content

What's hot

Why Python?
Why Python?Why Python?
Why Python?Adam Pah
 
Lexical Resources for Portuguese
Lexical Resources  for PortugueseLexical Resources  for Portuguese
Lexical Resources for PortugueseValeria de Paiva
 
From Python to Kotlin - TalkingKT 2019
From Python to Kotlin - TalkingKT 2019From Python to Kotlin - TalkingKT 2019
From Python to Kotlin - TalkingKT 2019Horgix
 
Python and its Applications
Python and its ApplicationsPython and its Applications
Python and its ApplicationsAbhijeet Singh
 
Logics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese UnderstandingLogics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese UnderstandingValeria de Paiva
 
What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)wesley chun
 
What is Python? (Silicon Valley CodeCamp 2015)
What is Python? (Silicon Valley CodeCamp 2015)What is Python? (Silicon Valley CodeCamp 2015)
What is Python? (Silicon Valley CodeCamp 2015)wesley chun
 
Python: The Programmer's Lingua Franca
Python: The Programmer's Lingua FrancaPython: The Programmer's Lingua Franca
Python: The Programmer's Lingua FrancaActiveState
 
Python Programming Language
Python Programming LanguagePython Programming Language
Python Programming LanguageLaxman Puri
 

What's hot (11)

Why Python?
Why Python?Why Python?
Why Python?
 
Lexical Resources for Portuguese
Lexical Resources  for PortugueseLexical Resources  for Portuguese
Lexical Resources for Portuguese
 
From Python to Kotlin - TalkingKT 2019
From Python to Kotlin - TalkingKT 2019From Python to Kotlin - TalkingKT 2019
From Python to Kotlin - TalkingKT 2019
 
Python and its Applications
Python and its ApplicationsPython and its Applications
Python and its Applications
 
Logics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese UnderstandingLogics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese Understanding
 
What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)
 
What is Python? (Silicon Valley CodeCamp 2015)
What is Python? (Silicon Valley CodeCamp 2015)What is Python? (Silicon Valley CodeCamp 2015)
What is Python? (Silicon Valley CodeCamp 2015)
 
Python unit1
Python unit1Python unit1
Python unit1
 
Python - the basics
Python - the basicsPython - the basics
Python - the basics
 
Python: The Programmer's Lingua Franca
Python: The Programmer's Lingua FrancaPython: The Programmer's Lingua Franca
Python: The Programmer's Lingua Franca
 
Python Programming Language
Python Programming LanguagePython Programming Language
Python Programming Language
 

Similar to Analyzing and Sharing HDF5 Data with Python

Five python libraries should know for machine learning
Five python libraries should know for machine learningFive python libraries should know for machine learning
Five python libraries should know for machine learningNaveen Davis
 
Introduction_to_Python.pptx
Introduction_to_Python.pptxIntroduction_to_Python.pptx
Introduction_to_Python.pptxVinay Chowdary
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of PythonAsia Smith
 
Python A Comprehensive Guide for Beginners.pdf
Python A Comprehensive Guide for Beginners.pdfPython A Comprehensive Guide for Beginners.pdf
Python A Comprehensive Guide for Beginners.pdfKajal Digital
 
Python libraries for data science
Python libraries for data sciencePython libraries for data science
Python libraries for data sciencenilashri2
 
Python standard library &amp; list of important libraries
Python standard library &amp; list of important librariesPython standard library &amp; list of important libraries
Python standard library &amp; list of important librariesgrinu
 
summer training report on python
summer training report on pythonsummer training report on python
summer training report on pythonShubham Yadav
 
R vs python
R vs pythonR vs python
R vs pythonPrwaTech
 
Top Libraries for Machine Learning with Python
Top Libraries for Machine Learning with Python Top Libraries for Machine Learning with Python
Top Libraries for Machine Learning with Python Chariza Pladin
 
Guide to Learn Python Programming.pdf
Guide to Learn Python Programming.pdfGuide to Learn Python Programming.pdf
Guide to Learn Python Programming.pdfNikhilSharma142682
 
Introduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonIntroduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonJen Stirrup
 
Why Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageWhy Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageEdureka!
 
Python – The Fastest Growing Programming Language
Python – The Fastest Growing Programming LanguagePython – The Fastest Growing Programming Language
Python – The Fastest Growing Programming LanguageIRJET Journal
 
Which programming language to learn R or Python - MeasureCamp XII
Which programming language to learn R or Python - MeasureCamp XIIWhich programming language to learn R or Python - MeasureCamp XII
Which programming language to learn R or Python - MeasureCamp XIIMaggie Petrova
 
Untitled document (12).pdf
Untitled document (12).pdfUntitled document (12).pdf
Untitled document (12).pdfcollinscafe
 

Similar to Analyzing and Sharing HDF5 Data with Python (20)

Five python libraries should know for machine learning
Five python libraries should know for machine learningFive python libraries should know for machine learning
Five python libraries should know for machine learning
 
Introduction_to_Python.pptx
Introduction_to_Python.pptxIntroduction_to_Python.pptx
Introduction_to_Python.pptx
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of Python
 
Python A Comprehensive Guide for Beginners.pdf
Python A Comprehensive Guide for Beginners.pdfPython A Comprehensive Guide for Beginners.pdf
Python A Comprehensive Guide for Beginners.pdf
 
Python libraries for data science
Python libraries for data sciencePython libraries for data science
Python libraries for data science
 
what is python ?
what is python ? what is python ?
what is python ?
 
Python standard library &amp; list of important libraries
Python standard library &amp; list of important librariesPython standard library &amp; list of important libraries
Python standard library &amp; list of important libraries
 
summer training report on python
summer training report on pythonsummer training report on python
summer training report on python
 
R vs python
R vs pythonR vs python
R vs python
 
Top Libraries for Machine Learning with Python
Top Libraries for Machine Learning with Python Top Libraries for Machine Learning with Python
Top Libraries for Machine Learning with Python
 
Python Mastery Made Easy.pdf
Python Mastery Made Easy.pdfPython Mastery Made Easy.pdf
Python Mastery Made Easy.pdf
 
Guide to Learn Python Programming.pdf
Guide to Learn Python Programming.pdfGuide to Learn Python Programming.pdf
Guide to Learn Python Programming.pdf
 
Introduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonIntroduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and Python
 
Python basics
Python basicsPython basics
Python basics
 
Why Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageWhy Python Should Be Your First Programming Language
Why Python Should Be Your First Programming Language
 
The Great Debate.pdf
The Great Debate.pdfThe Great Debate.pdf
The Great Debate.pdf
 
Introduction of Python
Introduction of PythonIntroduction of Python
Introduction of Python
 
Python – The Fastest Growing Programming Language
Python – The Fastest Growing Programming LanguagePython – The Fastest Growing Programming Language
Python – The Fastest Growing Programming Language
 
Which programming language to learn R or Python - MeasureCamp XII
Which programming language to learn R or Python - MeasureCamp XIIWhich programming language to learn R or Python - MeasureCamp XII
Which programming language to learn R or Python - MeasureCamp XII
 
Untitled document (12).pdf
Untitled document (12).pdfUntitled document (12).pdf
Untitled document (12).pdf
 

More from The HDF-EOS Tools and Information Center

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 

More from The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 

Recently uploaded

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Recently uploaded (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Analyzing and Sharing HDF5 Data with Python

  • 1. An Introduction to the Python Language P Y TABLES CSTABLES Summary P Y TABLES & Family Analyzing and Sharing HDF5 Data with Python Francesc Altet Cárabos Coop. V. HDF Workshop November 30, 2005 - December 2, 2005. Francesc Altet P Y TABLES & Family
  • 2. An Introduction to the Python Language P Y TABLES CSTABLES Summary Who are we? Cárabos is the company committed to the P Y TABLES suite development and deployment. We have years of experience in designing software solutions for handling extremely large datasets. What we provide: Commercial support for the P Y TABLES suite. P Y TABLES-based applications. Consulting services for managing complex data environments. Francesc Altet P Y TABLES & Family
  • 3. An Introduction to the Python Language P Y TABLES CSTABLES Summary Who are we? Cárabos is the company committed to the P Y TABLES suite development and deployment. We have years of experience in designing software solutions for handling extremely large datasets. What we provide: Commercial support for the P Y TABLES suite. P Y TABLES-based applications. Consulting services for managing complex data environments. Francesc Altet P Y TABLES & Family
  • 4. An Introduction to the Python Language P Y TABLES CSTABLES Summary Who are we? Cárabos is the company committed to the P Y TABLES suite development and deployment. We have years of experience in designing software solutions for handling extremely large datasets. What we provide: Commercial support for the P Y TABLES suite. P Y TABLES-based applications. Consulting services for managing complex data environments. Francesc Altet P Y TABLES & Family
  • 5. An Introduction to the Python Language P Y TABLES CSTABLES Summary Outline 1 An Introduction to the Python Language Facts about Python Scientific Packages An example 2 P Y TABLES Overview Usage Examples P Y TABLES Pro 3 CSTABLES Overview Design goals Examples of Use Francesc Altet P Y TABLES & Family
  • 6. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Outline 1 An Introduction to the Python Language Facts about Python Scientific Packages An example 2 P Y TABLES Overview Usage Examples P Y TABLES Pro 3 CSTABLES Overview Design goals Examples of Use Francesc Altet P Y TABLES & Family
  • 7. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Main Features Interpreted ⇒ Allows interactivity Flexible data structures ⇒ Malleability Minimalistic grammar ⇒ Easy to learn Very complete library (Batteries included) ⇒ Boosts productivity Francesc Altet P Y TABLES & Family
  • 8. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Main Features Interpreted ⇒ Allows interactivity Flexible data structures ⇒ Malleability Minimalistic grammar ⇒ Easy to learn Very complete library (Batteries included) ⇒ Boosts productivity Francesc Altet P Y TABLES & Family
  • 9. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Main Features Interpreted ⇒ Allows interactivity Flexible data structures ⇒ Malleability Minimalistic grammar ⇒ Easy to learn Very complete library (Batteries included) ⇒ Boosts productivity Francesc Altet P Y TABLES & Family
  • 10. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Main Features Interpreted ⇒ Allows interactivity Flexible data structures ⇒ Malleability Minimalistic grammar ⇒ Easy to learn Very complete library (Batteries included) ⇒ Boosts productivity Francesc Altet P Y TABLES & Family
  • 11. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Very Suitable for Scientific/Engineer Fields Interactivity has always been very appreciated for increasing productivity. Being interpreted does not necessarily mean being inefficient. Python has solutions for linking C & Fortran libraries in an easy way. Programming is normally considered a necessary evil. Rich expressivity of Python normally reduces the amount of code to solve real problems. Many efficient scientific libraries have been developed for Python: Numeric, numarray, SciPy, Scientific-Python, matplotlib... Francesc Altet P Y TABLES & Family
  • 12. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Very Suitable for Scientific/Engineer Fields Interactivity has always been very appreciated for increasing productivity. Being interpreted does not necessarily mean being inefficient. Python has solutions for linking C & Fortran libraries in an easy way. Programming is normally considered a necessary evil. Rich expressivity of Python normally reduces the amount of code to solve real problems. Many efficient scientific libraries have been developed for Python: Numeric, numarray, SciPy, Scientific-Python, matplotlib... Francesc Altet P Y TABLES & Family
  • 13. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Very Suitable for Scientific/Engineer Fields Interactivity has always been very appreciated for increasing productivity. Being interpreted does not necessarily mean being inefficient. Python has solutions for linking C & Fortran libraries in an easy way. Programming is normally considered a necessary evil. Rich expressivity of Python normally reduces the amount of code to solve real problems. Many efficient scientific libraries have been developed for Python: Numeric, numarray, SciPy, Scientific-Python, matplotlib... Francesc Altet P Y TABLES & Family
  • 14. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Very Suitable for Scientific/Engineer Fields Interactivity has always been very appreciated for increasing productivity. Being interpreted does not necessarily mean being inefficient. Python has solutions for linking C & Fortran libraries in an easy way. Programming is normally considered a necessary evil. Rich expressivity of Python normally reduces the amount of code to solve real problems. Many efficient scientific libraries have been developed for Python: Numeric, numarray, SciPy, Scientific-Python, matplotlib... Francesc Altet P Y TABLES & Family
  • 15. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Outline 1 An Introduction to the Python Language Facts about Python Scientific Packages An example 2 P Y TABLES Overview Usage Examples P Y TABLES Pro 3 CSTABLES Overview Design goals Examples of Use Francesc Altet P Y TABLES & Family
  • 16. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Basic Matrix Handling Numeric Very mature but lacking some features. Still has a huge user base. scipy.core Called to substitute both Numeric & numarray. When finished, it is supposed to have all the advantages of Numeric & numarray together. numarray Created to overcome some of the limitations of Numeric. Better handling of large arrays, heterogeneous data... Francesc Altet P Y TABLES & Family
  • 17. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Basic Matrix Handling Numeric Very mature but lacking some features. Still has a huge user base. scipy.core Called to substitute both Numeric & numarray. When finished, it is supposed to have all the advantages of Numeric & numarray together. numarray Created to overcome some of the limitations of Numeric. Better handling of large arrays, heterogeneous data... Francesc Altet P Y TABLES & Family
  • 18. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Basic Matrix Handling Numeric Very mature but lacking some features. Still has a huge user base. scipy.core Called to substitute both Numeric & numarray. When finished, it is supposed to have all the advantages of Numeric & numarray together. numarray Created to overcome some of the limitations of Numeric. Better handling of large arrays, heterogeneous data... Francesc Altet P Y TABLES & Family
  • 19. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Numerical Supplements SciPy Optimization, integration, special functions Signal and image processing Genetic algorithms, ODE solvers... Scientific Python Quaternions, automatic derivatives, (linear) interpolation, polinomials, ... Overlaps somewhat SciPy, but it has... A nice netCDF module for I/O Francesc Altet P Y TABLES & Family
  • 20. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Numerical Supplements SciPy Optimization, integration, special functions Signal and image processing Genetic algorithms, ODE solvers... Scientific Python Quaternions, automatic derivatives, (linear) interpolation, polinomials, ... Overlaps somewhat SciPy, but it has... A nice netCDF module for I/O Francesc Altet P Y TABLES & Family
  • 21. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Plotting (2D) matplotlib example matplotlib Great interactivity. Produces publication quality figures. Francesc Altet P Y TABLES & Family
  • 22. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Plotting (2D) PyQwt example PyQwt Fast plotting. Nice to be embedded in Qt apps. Francesc Altet P Y TABLES & Family
  • 23. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Plotting (3D) MayaVi example MayaVi 3D-oriented scientific data visualizer. Uses the Visualization Toolkit (VTK). Francesc Altet P Y TABLES & Family
  • 24. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Outline 1 An Introduction to the Python Language Facts about Python Scientific Packages An example 2 P Y TABLES Overview Usage Examples P Y TABLES Pro 3 CSTABLES Overview Design goals Examples of Use Francesc Altet P Y TABLES & Family
  • 25. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Programming as a Necessary Evil Evaluate En (x) = ∞ e−xt 1 t n dt for each value of n from scipy import * from scipy.integrate import quad, Inf def integrand(t,n,x): return exp(-x*t) / t**n def expint(n,x): return quad(integrand, 1, Inf, args=(n, x))[0] v_expint = vectorize(expint) print v_expint(3,arange(1.0,4.0,0.5)) The output [ 0.10969197, 0.05673949, 0.03013338, 0.01629537, 0.00893065, 0.00494538,] Francesc Altet P Y TABLES & Family
  • 26. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Programming as a Necessary Evil The output The plotting code (matplotlib) xrng = arange(1.0,4.0,0.1) plot(xrng, v_expint(1, xrng)) plot(xrng, v_expint(2, xrng)) plot(xrng, v_expint(3, xrng)) legend(("n=1", "n=2", "n=3")) title(’Exponential Integral’) Francesc Altet P Y TABLES & Family
  • 27. An Introduction to the Python Language P Y TABLES CSTABLES Summary Facts about Python Scientific Packages An example Resources Introductory material: docs.python.org/tut/tut.html www.python.org/doc/Intros.html Lutz & Ascher, "Learning Python": good introduction. Martelli, "Python in a Nutshell": useful reference. Martelli & Ascher, "Python Cookbook": more specialized, useful recipes for particular problems. Hans P. Langtangen, "Python Scripting for Computational Science": teaches computational scientists and engineers how to write small Python scripts efficiently. Francesc Altet P Y TABLES & Family
  • 28. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Outline 1 An Introduction to the Python Language Facts about Python Scientific Packages An example 2 P Y TABLES Overview Usage Examples P Y TABLES Pro 3 CSTABLES Overview Design goals Examples of Use Francesc Altet P Y TABLES & Family
  • 29. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro What is P Y TABLES? Simply stated: a database for Python based on HDF5. Designed to deal with extremely large datasets. Provides an easy-to-use interface. Supports many of the features of HDF5 and others that are not present in it (e.g. indexation). It is Open Source software (BSD license). Francesc Altet P Y TABLES & Family
  • 30. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Why HDF5? I started looking for different backends for saving large datasets, but HDF5 was the final winner. Thought out for managing very large datasets in an efficient way. Let you organize datasets hierchically. Very flexible and well tested in scientific environments. Good maintenance and improvement rate. It is Open Source software. Francesc Altet P Y TABLES & Family
  • 31. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Why HDF5? I started looking for different backends for saving large datasets, but HDF5 was the final winner. Thought out for managing very large datasets in an efficient way. Let you organize datasets hierchically. Very flexible and well tested in scientific environments. Good maintenance and improvement rate. It is Open Source software. Francesc Altet P Y TABLES & Family
  • 32. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro What Does Extremely Large Exactly Mean? The Hitch Hiker’s Guide to the Galaxy offers this definition of the word “Infinite”: Infinite: Bigger than the biggest thing ever and then some. Much bigger than that in fact, really amazingly immense, a totally stunning size, real “wow, that’s big”. Infinity is just so big that, by comparison, bigness looks really titchy. Gigantic multiplied by colossal multiplied by staggeringly huge is the sort of concept we’re trying to get across here. Disclaimer Agreed, ’Extremely Large’ may not exactly mean ’Infinite’, although it is pretty close to this definition. Francesc Altet P Y TABLES & Family
  • 33. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro What Does Extremely Large Exactly Mean? The Hitch Hiker’s Guide to the Galaxy offers this definition of the word “Infinite”: Infinite: Bigger than the biggest thing ever and then some. Much bigger than that in fact, really amazingly immense, a totally stunning size, real “wow, that’s big”. Infinity is just so big that, by comparison, bigness looks really titchy. Gigantic multiplied by colossal multiplied by staggeringly huge is the sort of concept we’re trying to get across here. Disclaimer Agreed, ’Extremely Large’ may not exactly mean ’Infinite’, although it is pretty close to this definition. Francesc Altet P Y TABLES & Family
  • 34. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro What Does Extremely Large Exactly Mean? The Hitch Hiker’s Guide to the Galaxy offers this definition of the word “Infinite”: Infinite: Bigger than the biggest thing ever and then some. Much bigger than that in fact, really amazingly immense, a totally stunning size, real “wow, that’s big”. Infinity is just so big that, by comparison, bigness looks really titchy. Gigantic multiplied by colossal multiplied by staggeringly huge is the sort of concept we’re trying to get across here. Disclaimer Agreed, ’Extremely Large’ may not exactly mean ’Infinite’, although it is pretty close to this definition. Francesc Altet P Y TABLES & Family
  • 35. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Which ELD Features Does P Y TABLES Support? A good base library (HDF5). Support for 64-bit in all data addressing. Need to overcome a Python slicing limitation: only 32-bit addresses are supported. Buffered I/O. A LRU cache system for an efficient reuse of objects. Fast indexing and searching capabilities for tables. Francesc Altet P Y TABLES & Family
  • 36. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Which ELD Features Does P Y TABLES Support? A good base library (HDF5). Support for 64-bit in all data addressing. Need to overcome a Python slicing limitation: only 32-bit addresses are supported. Buffered I/O. A LRU cache system for an efficient reuse of objects. Fast indexing and searching capabilities for tables. Francesc Altet P Y TABLES & Family
  • 37. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Which ELD Features Does P Y TABLES Support? A good base library (HDF5). Support for 64-bit in all data addressing. Need to overcome a Python slicing limitation: only 32-bit addresses are supported. Buffered I/O. A LRU cache system for an efficient reuse of objects. Fast indexing and searching capabilities for tables. Francesc Altet P Y TABLES & Family
  • 38. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Which ELD Features Does P Y TABLES Support? A good base library (HDF5). Support for 64-bit in all data addressing. Need to overcome a Python slicing limitation: only 32-bit addresses are supported. Buffered I/O. A LRU cache system for an efficient reuse of objects. Fast indexing and searching capabilities for tables. Francesc Altet P Y TABLES & Family
  • 39. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Which ELD Features Does P Y TABLES Support? A good base library (HDF5). Support for 64-bit in all data addressing. Need to overcome a Python slicing limitation: only 32-bit addresses are supported. Buffered I/O. A LRU cache system for an efficient reuse of objects. Fast indexing and searching capabilities for tables. Francesc Altet P Y TABLES & Family
  • 40. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Why Buffered I/O? Because of speed (what else?): Francesc Altet P Y TABLES & Family
  • 41. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Why a Cache System? 1.- To achieve better open file times: Francesc Altet P Y TABLES & Family
  • 42. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Why a Cache System? 2.- To achieve a conservative usage of the memory: Francesc Altet P Y TABLES & Family
  • 43. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Why Fast Indexing? It is not trifling in terms of time when you have to index tables with more than one billion of rows: Francesc Altet P Y TABLES & Family
  • 44. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Ease of Use Natural naming # access to file:/group1/table table = file.root.group1.table Support for generalized slicing # step means a stride in the slice table[start:stop:step] Support for iterators # get the values in col1 that satisfy the # (1.3 < col3 <= 2.) condition in table col3 = table.cols.col3 [r[’col1’] for r in table.where(1.3 < col3 <= 2.)] Francesc Altet P Y TABLES & Family
  • 45. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Ease of Use Natural naming # access to file:/group1/table table = file.root.group1.table Support for generalized slicing # step means a stride in the slice table[start:stop:step] Support for iterators # get the values in col1 that satisfy the # (1.3 < col3 <= 2.) condition in table col3 = table.cols.col3 [r[’col1’] for r in table.where(1.3 < col3 <= 2.)] Francesc Altet P Y TABLES & Family
  • 46. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Ease of Use Natural naming # access to file:/group1/table table = file.root.group1.table Support for generalized slicing # step means a stride in the slice table[start:stop:step] Support for iterators # get the values in col1 that satisfy the # (1.3 < col3 <= 2.) condition in table col3 = table.cols.col3 [r[’col1’] for r in table.where(1.3 < col3 <= 2.)] Francesc Altet P Y TABLES & Family
  • 47. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Outline 1 An Introduction to the Python Language Facts about Python Scientific Packages An example 2 P Y TABLES Overview Usage Examples P Y TABLES Pro 3 CSTABLES Overview Design goals Examples of Use Francesc Altet P Y TABLES & Family
  • 48. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Supported Objects in P Y TABLES Table: Lets you deal with heterogeneous datasets. Chunked. Enlargeable. Support for nested types. Array: Provides quick and dirty array handling. Not chunked. Non enlargeable. CArray: Provides compressed array support. Chunked. Not enlargeable. EArray: Most general array support. Chunked. Enlargeable. VLArray: Collections of homogeneous data with a variable number of entries. Chunked. Enlargeable. Group: The structural component. Francesc Altet P Y TABLES & Family
  • 49. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro What’s New in P Y TABLES 1.2? Added a new cache for the object tree. Allows a contained use of memory (even with huge trees). Almost instantaneous opening of files (good news for interactive use!). New NetCDF module (contributed by Jeff Whitaker). Provides API emulation for Scientific.IO.NetCDF. netCDF-3 ⇔ HDF5 conversions. tables.NetCDF datasets can be shared over the internet with the OPeNDAP protocol. Plans to write data natively in netCDF-4 format in the future. Francesc Altet P Y TABLES & Family
  • 50. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro What’s New in P Y TABLES 1.2? Added a new cache for the object tree. Allows a contained use of memory (even with huge trees). Almost instantaneous opening of files (good news for interactive use!). New NetCDF module (contributed by Jeff Whitaker). Provides API emulation for Scientific.IO.NetCDF. netCDF-3 ⇔ HDF5 conversions. tables.NetCDF datasets can be shared over the internet with the OPeNDAP protocol. Plans to write data natively in netCDF-4 format in the future. Francesc Altet P Y TABLES & Family
  • 51. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Some Metrics on P Y TABLES 1.2 Core Library ~ 7.5 thousand LOC (Python) ~ 2.8 thousand LOC (Pyrex) ~ 4.0 thousand LOC (C) Test Units ~ 3.1 thousand test units (Python) ~ 23 thousand LOC (Python) Documentation ~ 5.0 thousand lines of on-line doc strings ~ 160 pages of Users’ Guide in PDF (and HTML) Francesc Altet P Y TABLES & Family
  • 52. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Outline 1 An Introduction to the Python Language Facts about Python Scientific Packages An example 2 P Y TABLES Overview Usage Examples P Y TABLES Pro 3 CSTABLES Overview Design goals Examples of Use Francesc Altet P Y TABLES & Family
  • 53. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro What is P Y TABLES P RO? It’s just a regular P Y TABLES but with enhanced features: Improved search speed: Selections in tables with > 1 billion rows will be typically done in less than 1 second. Complex queries: Supports an unlimited combination of conditions. Query optimizer: Queries are analyzed, reordered and classified to get an optimum response time. Support for complex indices: Expressions like col1 + col2*col3 + col4**3 can be indexed and used for selections afterwards. Francesc Altet P Y TABLES & Family
  • 54. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Index Selection Speed Francesc Altet P Y TABLES & Family
  • 55. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Index Selection Speed: The Goal Francesc Altet P Y TABLES & Family
  • 56. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Usage Examples P Y TABLES Pro Current Status for P Y TABLES P RO Improving the query response time. Remains to be done: Complex queries. Query optimizer. Date of release: 2nd quarter 2006 (tentative). Francesc Altet P Y TABLES & Family
  • 57. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use Outline 1 An Introduction to the Python Language Facts about Python Scientific Packages An example 2 P Y TABLES Overview Usage Examples P Y TABLES Pro 3 CSTABLES Overview Design goals Examples of Use Francesc Altet P Y TABLES & Family
  • 58. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use CSTABLES: P Y TABLES Goes Client-Server Francesc Altet P Y TABLES & Family
  • 59. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use Outline 1 An Introduction to the Python Language Facts about Python Scientific Packages An example 2 P Y TABLES Overview Usage Examples P Y TABLES Pro 3 CSTABLES Overview Design goals Examples of Use Francesc Altet P Y TABLES & Family
  • 60. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use Design Goals Concurrency: Allows multiple access to the same file at the same time. Locking system: Avoids corruption of data. High Throughput: Data is transmitted in large blocks. High-speed compressor/decompressor used. Client Cache: Metadata is kept in the client-side when an object is first accessed. Full P Y TABLES API compatibility: Existing P Y TABLES programs can be re-used to access remote files without changing virtually anything. Francesc Altet P Y TABLES & Family
  • 61. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use Concurrency Issues Question: CSTABLES does not provide threading or asynchronous features yet. So, how does it deal with several requests at a time? Answer: Large data read and write requests are split into small chunks. This considerably improves server response time. In addition, CSTABLES provides a lock mechanism that allows applications to explicitly put a lock on a node or on an entire subtree. Different locking access modes: READ, WRITE and ALL Francesc Altet P Y TABLES & Family
  • 62. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use Concurrency Issues Question: CSTABLES does not provide threading or asynchronous features yet. So, how does it deal with several requests at a time? Answer: Large data read and write requests are split into small chunks. This considerably improves server response time. In addition, CSTABLES provides a lock mechanism that allows applications to explicitly put a lock on a node or on an entire subtree. Different locking access modes: READ, WRITE and ALL Francesc Altet P Y TABLES & Family
  • 63. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use Concurrency Issues Question: CSTABLES does not provide threading or asynchronous features yet. So, how does it deal with several requests at a time? Answer: Large data read and write requests are split into small chunks. This considerably improves server response time. In addition, CSTABLES provides a lock mechanism that allows applications to explicitly put a lock on a node or on an entire subtree. Different locking access modes: READ, WRITE and ALL Francesc Altet P Y TABLES & Family
  • 64. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use Client Cache More in Depth CSTABLES caches some of the metadata of the P Y TABLES object tree. When a client makes a change to the metadata, this change is pushed to the other clients’ caches. Easy to control which attributes should be cached and which should not. You should try to cache primarily read-only attributes. Caveat emptor: Caching attributes that are updated very often might generate more traffic than attributes that are not cached at all! Francesc Altet P Y TABLES & Family
  • 65. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use Client Cache More in Depth CSTABLES caches some of the metadata of the P Y TABLES object tree. When a client makes a change to the metadata, this change is pushed to the other clients’ caches. Easy to control which attributes should be cached and which should not. You should try to cache primarily read-only attributes. Caveat emptor: Caching attributes that are updated very often might generate more traffic than attributes that are not cached at all! Francesc Altet P Y TABLES & Family
  • 66. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use Client Cache More in Depth CSTABLES caches some of the metadata of the P Y TABLES object tree. When a client makes a change to the metadata, this change is pushed to the other clients’ caches. Easy to control which attributes should be cached and which should not. You should try to cache primarily read-only attributes. Caveat emptor: Caching attributes that are updated very often might generate more traffic than attributes that are not cached at all! Francesc Altet P Y TABLES & Family
  • 67. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use Client Cache More in Depth CSTABLES caches some of the metadata of the P Y TABLES object tree. When a client makes a change to the metadata, this change is pushed to the other clients’ caches. Easy to control which attributes should be cached and which should not. You should try to cache primarily read-only attributes. Caveat emptor: Caching attributes that are updated very often might generate more traffic than attributes that are not cached at all! Francesc Altet P Y TABLES & Family
  • 68. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use Outline 1 An Introduction to the Python Language Facts about Python Scientific Packages An example 2 P Y TABLES Overview Usage Examples P Y TABLES Pro 3 CSTABLES Overview Design goals Examples of Use Francesc Altet P Y TABLES & Family
  • 69. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use Two Different APIs P Y TABLES API (prefix with csclient -ip=server_IP) import tables fileh = tables.openFile("file.h5") print fileh.root.table.cols.col1[:] fileh.close() CSTABLES API (no need to be prefixed) from cstables.client import client c=client.Session() app=c.connect("your_FQDN_server") fileh=app.openFile("file.h5") print fileh.root.table.cols.col1[:] fileh.close() Francesc Altet P Y TABLES & Family
  • 70. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use Two Different APIs P Y TABLES API (prefix with csclient -ip=server_IP) import tables fileh = tables.openFile("file.h5") print fileh.root.table.cols.col1[:] fileh.close() CSTABLES API (no need to be prefixed) from cstables.client import client c=client.Session() app=c.connect("your_FQDN_server") fileh=app.openFile("file.h5") print fileh.root.table.cols.col1[:] fileh.close() Francesc Altet P Y TABLES & Family
  • 71. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use CSTABLES Status & Availability The main design features are already implemented and working. Beta available (caveat: only works against P Y TABLES 1.0). Focus now is on checking & debugging possible errors, improving the throughput and bettering the Users’ Guide. Future directions: threading, asynchronous communications. Francesc Altet P Y TABLES & Family
  • 72. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use The P Y TABLES Suite Is Getting Shape P Y TABLES The basic layer for the other components. P Y TABLES Pro P Y TABLES with a twist: Complex searches and ultra-fast selections in tables. CSTABLES The client-server P Y TABLES. It supports as well generic HDF5 files. V I TABLES A data viewer for P Y TABLES (and HDF5) files. Francesc Altet P Y TABLES & Family
  • 73. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use The P Y TABLES Suite Is Getting Shape P Y TABLES The basic layer for the other components. P Y TABLES Pro P Y TABLES with a twist: Complex searches and ultra-fast selections in tables. CSTABLES The client-server P Y TABLES. It supports as well generic HDF5 files. V I TABLES A data viewer for P Y TABLES (and HDF5) files. Francesc Altet P Y TABLES & Family
  • 74. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use The P Y TABLES Suite Is Getting Shape P Y TABLES The basic layer for the other components. P Y TABLES Pro P Y TABLES with a twist: Complex searches and ultra-fast selections in tables. CSTABLES The client-server P Y TABLES. It supports as well generic HDF5 files. V I TABLES A data viewer for P Y TABLES (and HDF5) files. Francesc Altet P Y TABLES & Family
  • 75. An Introduction to the Python Language P Y TABLES CSTABLES Summary Overview Design goals Examples of Use The P Y TABLES Suite Is Getting Shape P Y TABLES The basic layer for the other components. P Y TABLES Pro P Y TABLES with a twist: Complex searches and ultra-fast selections in tables. CSTABLES The client-server P Y TABLES. It supports as well generic HDF5 files. V I TABLES A data viewer for P Y TABLES (and HDF5) files. Francesc Altet P Y TABLES & Family
  • 76. An Introduction to the Python Language P Y TABLES CSTABLES Summary Summary Python is a joy for scientific computing. Start using it now! The P Y TABLES suite is designed to work with HDF5 files in an interactive, efficient and, most importantly, easy way. Outlook Working hard to release CSTABLES in the first quarter of 2006. P Y TABLES P RO will come later on (~ 2nd quarter of 2006). It would be nice to produce a parallel version of P Y TABLES (long term goal). Francesc Altet P Y TABLES & Family
  • 77. An Introduction to the Python Language P Y TABLES CSTABLES Summary Thank You! Thanks also to: Elena Pourmal, for inviting us to the Workshop. Frank Baker, for being kind. The HDF Group for pushing forward P Y TABLES. Francesc Altet P Y TABLES & Family