1. Dr REEJA S R
Dayananda Sagar University – School of Engineering
Kudlu Gate, Bangalore
Given Talk in DSCE ,ISE Dept., Bangalore
2. Introduction to HPC
What is Python?
Why Python for HPC
Python in HPC
3. What is HPC?
When do we need HPC?
What does HPC Include?
Rise &Falls of HPC Computer Architecture
4. • There is no clear definition
Computing on high performance computers
Solving problems / doing research using computer modeling, simulationand analysis
Engineering design using computer modeling, simulation and analysis
• My understanding
A huge number of computational and memory requirements
Cannot be afforded by a PC efficiently
Speeds and feeds are the keywords
• Who uses High-Performance Computing
Research institutes, universities and government labs
Weather and climate research, bioscience, energy, militaryetc.
Engineering design: more or less every product we use
Automotive, aerospace, oil and gas explorations, digital media, financialsimulation
Mechanical simulation, package designs, silicon manufacturingetc.
• Similar concepts
Parallel computing: computing on parallel computers
Super computing: computing on world 500 fastest supercomputers
5. • Case1: Complete a time-consuming operation in less time
I am an automotive engineer
I need to design a new car that consumes less gasoline
I’d rather have the design completed in 6 months than in 2 years
I want to test my design using computer simulations rather than building very
expensive prototypes and crashing them
• Case 2: Complete an operation under a tight deadline
I work for a weather prediction agency
I am getting input from weather stations/sensors
I’d like to predict tomorrow’s forecast today
• Case 3: Perform a high number of operations per seconds
I am an engineer at Amazon.com
My Web server gets 1,000 hits per seconds
I’d like my web server and databases to handle 1,000 transactions per
seconds so that customers do not experience bad delays
6. • High-performance computing is fast computing
• Computations in parallel over lots of compute elements (CPU,
• Very fast network to connect between the compute elements
• Computer Architecture
• Vector Computers, MPP, SMP, Distributed Systems, Clusters
• Network Connections
• InfiniBand, Ethernet, Proprietary (Myrinet, Quadrics, Cray-
• Programming models
• MPI (Message Passing Interface), SHMEM (Shared Memory),
PGAS( partitioned global address space), etc.
• Open source, commercial
7. Vector Computers (VC) - proprietary system
Provided the breakthrough needed for the emergence of computational
they were only a partial answer
Massively Parallel Processors (MPP) - proprietary systems
High cost and a low performance/price ratio.
Symmetric Multiprocessors (SMP)
Suffers from scalability
Difficult to use and hard to extract parallel performance
Clusters – commodity and highly popular
High Performance Computing - Commodity Supercomputing
High Availability Computing - Mission Critical Applications
8. Modern, interpreted, object-oriented, full
featured high level programming language
Portable (Unix/Linux, Mac OS X, Windows)
Open source, intellectual property rights
held by the Python Software Foundation
Python versions: 2.x and 3.x
Goal - Develop a small python program
that runs multiple serial execution with
different load balancing techniques
9. Fast program development
Easy to write well readable code
Large standard library
Lots of third party libraries
10. When you want to maximize productivity (not
Mature language with large user base
Huge collection of freely available software libraries
High Performance Computing
Engineering, Optimization, Differential Equations
Scientific Datasets, Analysis, Visualization
web apps, GUIs, databases, and tons more
Python combines the best of both JIT and AOTcode.
Write performance critical loops and kernels in
Write high level logic and “boiler plate” in Python
- Memory efficient array for primitive types
- Basic maths operations, include statistics
- Sql file based storage engine
- Variety of objects (deque, counter &
Huge varieties of libraries , including(
Numpy- a numerical python library
Scipy –Scientific libraries
Pandas-library for data analysis
Scikit-learn – default machine learning library
Biopython – bioinformatics library
Tornado – easy bindings for concurrency
Database bindings- for communicating with virtually all db
including Redis, MongoDB,HDF5 & SQL
Web development framework – Creating website
Opencv- binding for computer vision
API bindings – for easy access to popular web API(google,
twitter & linkdln)
Matplotlib: python –m pip install matplotlib
13. High level
-lower barriers, reduce time to solution
Interfaces with os, libraries and other
- Make a great glue for automating the modern scientific work
- Sage(ties together biggest open source numeric software into
a unified python interface
- Reduce re-inventing of wheels
- Portable, free, transparent, verifiable
- Scales to arbitrary number of nodes with no license costs
-Interactive data analysis and plotting
-Interactive parallel computing
14. Numpy : Array data structure
>>> from numpy.random import *
>>> from pylab import *
>>> hist(randn(10000), 100)
19. History of NumPy
– a powerful N-dimensional array object
– sophisticated (broadcasting) functions
– tools for integrating C/C++ and Fortran code
– useful linear algebra, Fourier transform, and
random number capabilities
– Based originally on Numeric by Jim Hugunin
– Also based on NumArray by Perry Greenﬁeld
– Written by Travis Oliphant to bring both feature
20. What makes an array so much faster?
– homogenous: every item takes up the
same size block of memory
– single data-type objects
– powerful array scalar types
universal function (ufuncs)
– function that operates on ndarrays in an
– vectorized wrapper for a function
– built-in functions are implemented in
compiled C code
21. Data layout
homogenous: every item takes up the same
size block of memory
single data-type objects
powerful array scalar types
22. Numpy has a sophisticated view of data.
bool int int8 int16
int32 int64 uint8
uint16 uint32 uint64
ﬂoat ﬂoat16 ﬂoat32
ﬂoat64 complex complex64
-use faster hardware – more cores, more cache, more GHz
-use cpu vector instruction
- Byte code and everything is in object
- fast fetcher
-load directly to numpy array
-Improves RDBMS query speed
-Speed up data message
-Cache previous day’s data
-Switch from batch to online architecture
-6 process slots cut runtime to 2 hours
-Fully parallel crashes the db
-was developed to extend python’s
scripting abilities to parallel and distributed
- Parallel extension modules are written
- modules and processing can be
combined in one convenient place to simplify
- single python script can provide setup,
simulation, instruction and postprocessing
- Tests a system’s linking and loading
-pynamic drivers will perform a test of
the MPI functionality
Can also gather performance matric
including the job startup time, module import
time, function visit time and MPI test time
- For parallel scientific computing, we provide a
high-level interface to the Trilinos and Tpetra parallel
linear algebra library.
- This makes parallel linear algebra
- Easier to use via a simplified user interface
- More intuitive through features such as advanced indexing
- More useful by enabling access to it from the already extensive
Python scientific software stack.
30. Optimized Distributed NumPy (ODIN)
- builds on top of the NumPy
- providing a distributed array data
structure that makes parallel array-based
- It provides built-in functions that work
with distributed arrays
- Framework for creating new functions
that work with distributed arrays.
31. ODIN’s approach has several advantages:
- Users have access to arrays in the same way that
they think about them: either globally or locally.
- As ODIN arrays are easier to use and reason about
than the MPI-equivalent, this leads to faster iterative
cycles, more flexibility when exploring parallel algorithms,
and an overall reduction in total time-to-solution.
- ODIN is designed to work with existing MPI
– By using Python, ODIN can leverage the ecosystem
of speed-related third party packages, either to wrap
external code or to accelerate existing Python code.
- With the power and expressiveness of NumPy array
slicing, ODIN can optimize distributed array expressions.
These optimizations include: loop fusion, array expression
analysis to select the appropriate communication strategy
between worker nodes
32. • ODIN’s basic features
—distributed array creation, unary and binary ufunc
application, global and local modes of interaction
- are currently being tested on systems and clusters
with small to mid-range number of nodes.
- for automatic, Just-in-time compilation
of Python source code.
-Seamless aims to make node-level
Python code as fast as compiled languages via
-It also allows effortless access to
compiled libraries in Python, allowing easy
integration of existing code bases written in
statically typed languages.
34. • Schematic relation between PyTrilinos, ODIN, and Seamless.
• Each of the three packages is standalone.
• ODIN can use Seamless and PyTrilinos and the functionality that
these two packages provide.
• Seamless provides four principal features, while PyTrilinos
wraps several Trilinos solver packages.
35. Python is too slow.
-Seamless allows compilation to fast machine code,
either dynamically or statically.
Python is yet another language to integrate with existing
-Seamless allows easy interaction between Python and
other languages, and removes nearly all barriers to inter-
The Python HPC ecosystem is too small.
- PyTrilinos provides access to a comprehensive suite of
HPC solvers. Further, ODIN will provide a library of functions and
methods designed to work with distributed arrays, and its design
allows access to any existing MPI routines.
Integrating all components is too difficult.
-ODIN provides a common framework to integrate
disparate components for distributed computing.
-Processor capacity and memory bandwidth are scaling faster than
-A solution is required that provides higher overall available I/O
bandwidth per socket to accelerate message passing interface (MPI) rates for
tomorrow’s HPC deployments.
Cost and density.
-More components in a server limit density and increase fabric cost.
-An integrated fabric controller helps eliminate the additional costs
and required space of discrete cards, enabling higher server density while
freeing up a valuable PCIe slot for other storage and networking controllers.
Reliability and power.
-Discrete interface cards consume many watts of power.
-An integrated interface card on the processor can draw less power
with fewer discrete components.
37. Python is a dynamic object-oriented programming
Because of its powerful and flexible syntax, Python
excels as a platform for High Performance
Computing and scientific computing.
Versatility, simplicity of use, high portability and
the large number of open source modules and
packages make it very popular for scientific use.
Pure Python is generally slower compared to
traditional language (C or Fortran), there are
various techniques and libraries that allow you to
obtain performance absolutely comparable to
those of the most common compiled languages,
assuring a good balance between computational
performance and time investment.