SlideShare una empresa de Scribd logo
1 de 244
Descargar para leer sin conexión
1 LifeSciencesRiskAnalysisSimulationBigDataAnalyticsHighPerformanceComputing
HIGH PERFORMANCE
COMPUTING 2015/16
Technology Compass
N
E
SE
SW
NE
NWMore Than 30 Years of Experience
in Scientific Computing
3
1980markedthebeginningofadecadewherenumerousstartups
were created, some of which later transformed into big players in
the IT market. Technical innovations brought dramatic changes
to the nascent computer market. In Tübingen, close to one of Ger-
many’s prime and oldest universities, transtec was founded.
In the early days, transtec focused on reselling DEC computers
and peripherals, delivering high-performance workstations to
university institutes and research facilities. In 1987, SUN/Sparc
and storage solutions broadened the portfolio, enhanced by
IBM/RS6000 products in 1991. These were the typical worksta-
tions and server systems for high performance computing then,
used by the majority of researchers worldwide.
In the late 90s, transtec was one of the first companies to offer
highly customized HPC cluster solutions based on standard Intel
architecture servers, some of which entered the TOP 500 list of
the world’s fastest computing systems.
Thus, given this background and history, it is fair to say that tran-
stec looks back upon a more than 30 years’ experience in scientif-
ic computing; our track record shows more than 750 HPC instal-
lations. With this experience, we know exactly what customers’
demands are and how to meet them. High performance and ease
of management – this is what customers require today.
HPC systems are for sure required to peak-perform, as their name
indicates, but that is not enough: they must also be easy to han-
dle. Unwieldy design and operational complexity must be avoid-
ed or at least hidden from administrators and particularly users
of HPC computer systems.
This brochure focusses on where transtec HPC solutions excel.
transtec HPC solutions use the latest and most innovative tech-
nology. Bright Cluster Manager as the technology leader for uni-
fied HPC cluster management, leading-edge Moab HPC Suite for
job and workload management, Intel Cluster Ready certification
as an independent quality standard for our systems, Panasas HPC
storage systems for highest performance and real ease of man-
agement required of a reliable HPC storage system. Again, with
these components, usability, reliability and ease of management
are central issues that are addressed, even in a highly heteroge-
neous environment. transtec is able to provide customers with
well-designed, extremely powerful solutions for Tesla GPU com-
puting, as well as thoroughly engineered Intel Xeon Phi systems.
Intel’s InfiniBand Fabric Suite makes managing a large InfiniBand
fabric easier than ever before, and Numascale provides excellent
Technology for AMD-based large-SMP Systems – transtec mas-
terly combines excellent and well-chosen components that are
already there to a fine-tuned, customer-specific, and thoroughly
designed HPC solution.
Your decision for a transtec HPC solution means you opt for most
intensive customer care and best service in HPC. Our experts will
be glad to bring in their expertise and support to assist you at
any stage, from HPC design to daily cluster operations, to HPC
Cloud Services.
Last but not least, transtec HPC Cloud Services provide custom-
ers with the possibility to have their jobs run on dynamically pro-
vided nodes in a dedicated datacenter, professionally managed
and individually customizable. Numerous standard applications
like ANSYS, LS-Dyna, OpenFOAM, as well as lots of codes like
Gromacs, NAMD, VMD, and others are pre-installed, integrated
into an enterprise-ready cloud management environment, and
ready to run.
Have fun reading the transtec HPC Compass 2015/16!
High Performance Computing
for Everyone
Director of HPC Solutions at transtec AG, discussing
the wide area of use of High Performance Comput-
ing (HPC), the benefits of turnkey solutions and the
increasing popularity of Big Data Analytics.
Interview with Dr. Oliver Tennert
5
Dr. Tennert, in which industries have you seen an increase in
demand for high performance computing?
Fromourperspective,itisdifficulttodifferentiatebetweenspecific
segments.Ingeneral,thehighperformancecomputingmarketisa
globallyexpandingsector.TheUSAmericanITanalystIDCpredicts
an annual average growth of approximately 7% over the next 5
years across all sectors even in traditional HPC-dominated indus-
tries such as CAE (Computer-Aided Engineering) and Life Sciences.
For which type of applications is HPC usually deployed? Could
you give us a concrete example of its application?
High performance computing is basically divided into two areas
of application: simulation on the one hand and data analysis on
the other. Take the automotive industry for example.
During the vehicle design and development stage, a great deal of
effort is invested in crash simulations while the number of ma-
terial-intensive crash tests, which used to be performed regular-
ly, have been minimized and are only conducted for verification
purposes.
The analysis of sequenced genome data for hereditary diseases
is a prime example of how data analysis is used effectively – or
from the field of molecular genetics – the measurement of evolu-
tionary relations between different types or the effect of viruses
on the germline.
High performance computing is often criticized under the as-
pect that a HPC installation is far too expensive and complex
for “ordinary” organisations. Which arguments would you use
against these accusations?
[Grinning:] In the past we have sold HPC solutions to many cus-
tomers who would describe themselves as “ordinary” and who
have been extremely satisfied with the results. But you are in
fact right: when using a complex HPC system, it involves hiding
this complexity away from users and administrators. When you
take a look under the hood of a car, you will see that it is also a
complex machine. However drivers do not have to concern them-
selves with features such as transmission technology and ABS
electronics when driving the vehicle.
It is therefore important to ensure you implement a straightfor-
ward and enterprise-ready management and service concept.
transtec offers an intelligent portfolio of service and manage-
ment concepts which are compatible with the paradigms of
“easy to manage” and “easy to use”.
What are the challenges when launching, installing and operat-
ing an HPC system? What should administrators bear in mind to
ensure that the HPC system remains manageable?
When looking for an HPC systems provider, customers should
make sure that the provider has precisely understood their needs
and requirements. This is what differentiates a solutions provider
6
High Performance Computing
for Everyone
Interview with Dr. Oliver Tennert
such as transtec from a product-centric hardware supplier or
manufacturer. The optimum sizing of the entire solution and the
above mentioned easy manageability has a direct impact on the
customer’s or user’s productivity.
Once the system has been delivered, customers want to start pro-
ductive operation as soon as possible. transtec provides custom-
ers with turnkey HPC solutions, which means: depending on the
customer’srequirements,ourspecialistscanmanagethecomplete
installation and integration into the customer’s environment.
What is the role of HPC when deploying cloud computing and
big data analyses in large organisations?
When deploying cloud computing, it is important to make a dis-
tinction: that what we today call a “private cloud” is essentially
– leaving technological details to one side – a tried-and-tested
concept in which companies can consolidate and provide ac-
cess to their computing capacities from one central location.
The use of “public cloud” providers such as Amazon has signifi-
cantly dropped in light of previous NSA reports. Security aspects
have always been a serious obstacle for cloud computing and, in
terms of HPC, the situation becomes all the more complicated
for an industrial organisation as high performance computing is
an essential element in the organisation’s value chain. The data
the organisation uses for work is a vital asset and critical to the
company.
BigDataAnalytics,inotherwordstheabilitytoprocessandanalyse
large data volumes, is essentially nothing new. Recently however,
the amount of data has increased at a significantly faster pace
than the computing capacities available.
7
For example, with the use of biochemi-cal methods in “next
generation sequencing”, complete genome sequences can
today be generated and prepared for further analysis
at a far faster rate than ever before. The need for
HPC solutions has recently become increas-
ingly more evident.
While we are on the topic of big data: To
what extent are high-performance, modern
in-memory computing technologies competing
with HPC systems?
This depends on whether you see these technologies in direct
competition to each other! Shared in-memory databases and the
associated computing technologies are innovative concepts for
the particular application, for example real-time analytics or data
mining. We believe that both are contemporary technologies like
manyothersthatenabletheanalysisoflargedatavolumesinreal-
time. From transtec’s point of view, these are HPC technologies
and some of our customers are already using them on HPC
solutions we have provided.
What are expected of today’s supercomputers? What does the
future hold for the HPC segment – and what follows on from
Petaflop and Petabyte?
Whenwetalkofsupercomputers,wearereferringtothetop500list
of the fastest computers in the world. “Flop” is used as a measure
of performance, in other words the number of floating point
operations per second. “Tianhe-2” from China currently holds
the number one position with an amazing 34 Petaflops, in other
words 34 billion Flops. However rank 500 scoring 133 Teraflops
(133 million Flops) is still an impressive figure. That is rough-
ly the equivalent of the consolidated performance of
2,000 well configured PCs.
Exabyte follows Petabyte. Over the last
few years it has been widely discussed
on the HPC market what kind of sys-
tem would be needed to manage these
data volumes, particularly in terms of capacity,
power supply and cooling. Significant technology
advancements will expectedly be required before this
type of supercomputer can be produced. Sheer mass alone
will not be able to manage these volumes.
Keyword - security: What is the best way to protect HPC systems
in an age of the NSA scandal and ever increasing cybercrime?
Basically the same applies to HPC systems as it does to company
notebooks and workstations where IT security is concerned:
access by unauthorised parties must be prohibited by deploying
secure IT structures. As described above, enormous volumes of
critical organisation data are stored on an HPC system which is
why efficient data management with effective access control are
essential in HPC environments.
The network connections between the data centre and user
workstations must also be secured on a physical level. This
especially applies when the above-mentioned “private cloud”
scenarios are set up across various locations to allow them to
work together regardless of the particular country or time zone.
„Petabyte
is followed by
Exabyte, although before
that, several technological
jumps need to
take place.“
8
Technology Compass
Table of Contents and Introduction
High Performance Computing......................................... 10
Performance Turns Into Productivity....................................................12
Flexible Deployment With xCAT................................................................14
Service and Customer Care From A to Z ..............................................16
Vector Supercomputing....................................................... 18
The NEC SX Architecture................................................................................20
History of the SX Series..................................................................................25
Advanced Cluster Management Made Easy............ 34
Easy-to-use, Complete and Scalable.......................................................36
Cloud Bursting With Bright..........................................................................44
Intelligent HPC Workload Management.................... 46
Moab HPC Suite – Enterprise Edition.....................................................48
Moab HPC Suite – Grid Option....................................................................52
Optimizing Accelerators with Moab HPC Suite...............................54
Moab HPC Suite – Application Portal Edition...................................58
Moab HPC Suite – Remote Visualization Edition............................60
Remote Visualization and
Workflow Optimization........................................................ 68
NICE EnginFrame: A Technical Computing Portal..........................70
Desktop Cloud Virtualization.....................................................................74
Cloud Computing...............................................................................................78
NVIDIA GRID...........................................................................................................80
Virtual GPU Technology..................................................................................82
9
Intel Cluster Ready ................................................................. 84
A Quality Standard for HPC Clusters......................................................86
Intel Cluster Ready builds HPC Momentum......................................90
The transtec Benchmarking Center........................................................94
Parallel NFS.................................................................................. 96
The New Standard for HPC Storage........................................................98
Whats´s New in NFS 4.1?............................................................................ 100
Panasas HPC Storage.................................................................................... 102
BeeGFS –
A Parallel File System...........................................................116
A Parallel File System.................................................................................... 118
NVIDIA GPU Computing......................................................126
What is GPU Computing?........................................................................... 128
Kepler GK110/210 GPU Computing Architecture......................... 130
A Quick Refresher on CUDA....................................................................... 140
Intel Xeon Phi Coprocessor..............................................142
The Architecture.............................................................................................. 144
An Outlook on Knights Landing and Omni Path.......................... 152
InfiniBand High-Speed Interconnect.........................156
The Architecture.............................................................................................. 158
Intel MPI Library 4.0 Performance......................................................... 160
Intel Fabric Suite 7.......................................................................................... 164
Numascale..................................................................................168
NumaConnect Background....................................................................... 170
Technology.......................................................................................................... 172
Redefining Scalable OpenMP and MPI............................................... 178
Big Data................................................................................................................. 182
Cache Coherence............................................................................................. 184
NUMA and ccNUMA........................................................................................ 186
IBM Platform HPC...................................................................188
Enterprise-Ready Cluster & Workload Management................ 190
What’s New in IBM Platform LSF 8........................................................ 194
IBM Platform MPI 8.1..................................................................................... 202
General Parallel File System (GPFS).............................204
What is GPFS?.................................................................................................... 206
What´s New in GPFS Version 3.5............................................................ 220
Glossary........................................................................................222
High Performance Computing (HPC) has been with us from the very beginning of the
computer era. High-performance computers were built to solve numerous problems
which the “human computers” could not handle.
The term HPC just hadn’t been coined yet. More important, some of the early principles
have changed fundamentally.
High Performance
Computing – Performance
Turns Into Productivity
LifeSciencesRiskAnalysisSimulationBigDataAnalyticsHighPerformanceComputing
12
Performance Turns Into Productivity
HPC systems in the early days were much different from those
we see today. First, we saw enormous mainframes from large
computer manufacturers, including a proprietary operating
system and job management system. Second, at universities
and research institutes, workstations made inroads and scien-
tists carried out calculations on their dedicated Unix or VMS
workstations. In either case, if you needed more computing
power, you scaled up, i.e. you bought a bigger machine. Today
the term High-Performance Computing has gained a fundamen-
tally new meaning. HPC is now perceived as a way to tackle
complex mathematical, scientific or engineering problems. The
integration of industry standard, “off-the-shelf” server hardware
into HPC clusters facilitates the construction of computer net-
works of such power that one single system could never achieve.
The new paradigm for parallelization is scaling out.
Computer-supported simulations of realistic processes (so-called
ComputerAidedEngineering–CAE)hasestablisheditselfasathird
key pillar in the field of science and research alongside theory
and experimentation. It is nowadays inconceivable that an air-
craft manufacturer or a Formula One racing team would operate
without using simulation software. And scientific calculations,
such as in the fields of astrophysics, medicine, pharmaceuticals
and bio-informatics, will to a large extent be dependent on super-
computers in the future. Software manufacturers long ago rec-
ognized the benefit of high-performance computers based on
powerful standard servers and ported their programs to them
accordingly.
The main advantages of scale-out supercomputers is just that:
they are infinitely scalable, at least in principle. Since they are
based on standard hardware components, such a supercomput-
er can be charged with more power whenever the computational
capacity of the system is not sufficient any more, simply by add-
ing additional nodes of the same kind. A cumbersome switch to a
different technology can be avoided in most cases.
High Performance Computing
Performance Turns Into Productivity
“transtec HPC solutions are meant to provide customers with unpar-
alleled ease-of-management and ease-of-use. Apart from that, decid-
ing for a transtec HPC solution means deciding for the most intensive
customer care and the best service imaginable.”
Dr. Oliver Tennert
Director Technology Management & HPC Solutions
S
EW
SE
SW tech BOX
13
Variations on the theme: MPP and SMP
Parallel computations exist in two major variants today. Ap-
plications running in parallel on multiple compute nodes are
frequently so-called Massively Parallel Processing (MPP) appli-
cations. MPP indicates that the individual processes can each
utilize exclusive memory areas. This means that such jobs are
predestined to be computed in parallel, distributed across the
nodes in a cluster. The individual processes can thus utilize the
separate units of the respective node – especially the RAM, the
CPU power and the disk I/O.
Communication between the individual processes is implement-
ed in a standardized way through the MPI software interface
(Message Passing Interface), which abstracts the underlying
network connections between the nodes from the processes.
However, the MPI standard (current version 2.0) merely requires
source code compatibility, not binary compatibility, so an off-the-
shelf application usually needs specific versions of MPI libraries
in order to run. Examples of MPI implementations are OpenMPI,
MPICH2, MVAPICH2, Intel MPI or – for Windows clusters – MS-MPI.
If the individual processes engage in a large amount of commu-
nication, the response time of the network (latency) becomes im-
portant. Latency in a Gigabit Ethernet or a 10GE network is typi-
cally around 10 µs. High-speed interconnects such as InfiniBand,
reduce latency by a factor of 10 down to as low as 1 µs. Therefore,
high-speed interconnects can greatly speed up total processing.
The other frequently used variant is called SMP applications.
SMP, in this HPC context, stands for Shared Memory Processing.
It involves the use of shared memory areas, the specific imple-
mentation of which is dependent on the choice of the underlying
operating system. Consequently, SMP jobs generally only run on
a single node, where they can in turn be multi-threaded and thus
be parallelized across the number of CPUs per node. For many
HPC applications, both the MPP and SMP variant can be chosen.
Many applications are not inherently suitable for parallel execu-
tion. In such a case, there is no communication between the in-
dividual compute nodes, and therefore no need for a high-speed
network between them; nevertheless, multiple computing jobs
can be run simultaneously and sequentially on each individual
node, depending on the number of CPUs.
In order to ensure optimum computing performance for these
applications, it must be examined how many CPUs and cores de-
liver the optimum performance.
We find applications of this sequential type of work typically in
the fields of data analysis or Monte-Carlo simulations.
BOX
14
High Performance Computing
Flexible Deployment With xCAT
xCAT as a Powerful and Flexible Deployment Tool
xCAT (Extreme Cluster Administration Tool) is an open source
toolkit for the deployment and low-level administration of HPC
cluster environments, small as well as large ones.
xCAT provides simple commands for hardware control, node
discovery, the collection of MAC addresses, and the node deploy-
ment with (diskful) or without local (diskless) installation. The
cluster configuration is stored in a relational database. Node
groups for different operating system images can be defined.
Also, user-specific scripts can be executed automatically at in-
stallation time.
xCAT Provides the Following Low-Level Administrative
Features
 Remote console support
 Parallel remote shell and remote copy commands
 Plugins for various monitoring tools like Ganglia or Nagios
 Hardware control commands for node discovery, collecting
MAC addresses, remote power switching and resetting of
nodes
 Automatic configuration of syslog, remote shell, DNS, DHCP,
and ntp within the cluster
 Extensive documentation and man pages
For cluster monitoring, we install and configure the open source
tool Ganglia or the even more powerful open source solution Na-
gios, according to the customer’s preferences and requirements.
15
Local Installation or Diskless Installation
We offer a diskful or a diskless installation of the cluster nodes.
A diskless installation means the operating system is hosted par-
tially within the main memory, larger parts may or may not be
included via NFS or other means. This approach allows for de-
ploying large amounts of nodes very efficiently, and the cluster
is up and running within a very small timescale. Also, updating
the cluster can be done in a very efficient way. For this, only the
boot image has to be updated, and the nodes have to be reboot-
ed. After this, the nodes run either a new kernel or even a new
operating system. Moreover, with this approach, partitioning the
cluster can also be very efficiently done, either for testing pur-
poses, or for allocating different cluster partitions for different
users or applications.
Development Tools, Middleware, and Applications
According to the application, optimization strategy, or underly-
ing architecture, different compilers lead to code results of very
different performance. Moreover, different, mainly commercial,
applications, require different MPI implementations. And even
when the code is self-developed, developers often prefer one MPI
implementation over another.
According to the customer’s wishes, we install various compilers,
MPI middleware, as well as job management systems like Para-
station, Grid Engine, Torque/Maui, or the very powerful Moab
HPC Suite for the high-level cluster management.
BOX
individual Presales
consulting
application-,
customer-,
site-specific
sizing of
HPC solution
burn-in tests
of systems
benchmarkingof
different systems
continual
improvement
software
 OS
installation
application
installation
onsite
hardware
assembly
integration
into
customer’s
environment
customer
training
maintenance,
support 
managedservices
individual Presales
consulting
application-,
customer-,
site-specific
sizing of
HPC solution
burn-in tests
of systems
benchmarkingof
different systems
continual
improvement
software
 OS
installation
application
installation
onsite
hardware
assembly
integration
into
customer’s
environment
customer
training
maintenance,
support 
managedservices
16
High Performance Computing
Services and Customer Care from A to Z
transtec HPC as a Service
You will get a range of applications like LS-Dyna, ANSYS,
Gromacs, NAMD etc. from all kinds of areas pre-installed,
integrated into an enterprise-ready cloud and workload
management system, and ready-to run. Do you miss your
application?
Ask us: HPC@transtec.de
transtec Platform as a Service
You will be provided with dynamically provided compute
nodes for running your individual code. The operating sys-
tem will be pre-installed according to your requirements.
Common Linux distributions like RedHat, CentOS, or SLES
are the standard. Do you need another distribution?
Ask us: HPC@transtec.de
transtec Hosting as a Service
You will be provided with hosting space inside a profession-
ally managed and secured datacenter where you can have
your machines hosted, managed, maintained, according to
your requirements. Thus, you can build up your own private
cloud. What range of hosting and maintenance services do
you need?
Tell us: HPC@transtec.de
HPC @ transtec:
Services and Customer Care from A to Z
transtec AG has over 30 years of experience in scientific computing
andisoneoftheearliestmanufacturersofHPCclusters.Fornearlya
decade,transtechasdeliveredhighlycustomizedHighPerformance
clusters based on standard components to academic and industry
customersacrossEuropewithallthehighqualitystandardsandthe
customer-centric approach that transtec is well known for. Every
transtec HPC solution is more than just a rack full of hardware – it is
a comprehensive solution with everything the HPC user, owner, and
operator need. In the early stages of any customer’s HPC project,
transtec experts provide extensive and detailed consulting to the
customer – they benefit from expertise and experience. Consulting
is followed by benchmarking of different systems with either spe-
cificallycraftedcustomercodeorgenerallyacceptedbenchmarking
routines; this aids customers in sizing and devising the optimal and
detailed HPC configuration. Each and every piece of HPC hardware
thatleavesourfactoryundergoesaburn-inprocedureof24hoursor
Services and Customer Care from A to Z
17
more if necessary. We make sure that any hardware shipped meets
ourandourcustomers’qualityrequirements.transtecHPCsolutions
are turnkey solutions. By default, a transtec HPC cluster has every-
thing installed and configured – from hardware and operating sys-
temtoimportantmiddlewarecomponentslikeclustermanagement
or developer tools and the customer’s production applications.
Onsite delivery means onsite integration into the customer’s pro-
ductionenvironment,beitestablishingnetworkconnectivitytothe
corporate network, or setting up software and configuration parts.
transtec HPC clusters are ready-to-run systems – we deliver, you
turn the key, the system delivers high performance. Every HPC proj-
ect entails transfer to production: IT operation processes and poli-
ciesapplytothenewHPCsystem.Effectively,ITpersonnelistrained
hands-on, introduced to hardware components and software, with
all operational aspects of configuration management.
transtec services do not stop when the implementation projects
ends. Beyond transfer to production, transtec takes care. transtec
offers a variety of support and service options, tailored to the cus-
tomer’s needs. When you are in need of a new installation, a major
reconfiguration or an update of your solution – transtec is able to
support your staff and, if you lack the resources for maintaining the
cluster yourself, maintain the HPC solution for you.
From Professional Services to Managed Services for daily opera-
tionsandrequiredservicelevels,transtecwillbeyourcompleteHPC
service and solution provider. transtec’s high standards of perfor-
mance, reliability and dependability assure your productivity and
completesatisfaction.transtec’sofferingsofHPCManagedServices
offercustomersthepossibilityofhavingthecompletemanagement
and administration of the HPC cluster managed by transtec service
specialists,inanITILcompliantway.Moreover,transtec’sHPConDe-
mand services help provide access to HPC resources whenever they
need them, for example, because they do not have the possibility
of owning and running an HPC cluster themselves, due to lacking
infrastructure, know-how, or admin staff.
transtec HPC Cloud Services
Last but not least transtec’s services portfolio evolves as custom-
ers‘ demands change. Starting this year, transtec is able to provide
HPCCloudServices.transtecusesadedicateddatacentertoprovide
computing power to customers who are in need of more capacity
than they own, which is why this workflow model is sometimes
called computing-on-demand. With these dynamically provided
resources, customers with the possibility to have their jobs run on
HPC nodes in a dedicated datacenter, professionally managed and
secured, and individually customizable. Numerous standard ap-
plications like ANSYS, LS-Dyna, OpenFOAM, as well as lots of codes
like Gromacs, NAMD, VMD, and others are pre-installed, integrated
into an enterprise-ready cloud and workload management environ-
ment, and ready to run.
Alternatively, whenever customers are in need of space for host-
ing their own HPC equipment because they do not have the space
capacity or cooling and power infrastructure themselves, transtec
is also able to provide Hosting Services to those customers who’d
liketohavetheirequipmentprofessionallyhosted,maintained,and
managed. Customers can thus build up their own private cloud!
Are you interested in any of transtec’s broad range of HPC related
services? Write us an email to HPC@transtec.de. We’ll be happy to
hear from you!
Initially, in the 80s and 90s, the notion of supercomputing was almost a synonym for
vector-computing. And this concept is back now in different flavors. Now almost every
architecture is using SIMD, the acronym for “single instruction multiple data”.
Vector Supercomputing –
The NEC SX Architecture
LifeSciencesRiskAnalysisSimulationBigDataAnalyticsHighPerformanceComputing
20
History of HPC, and what we learn
During the last years systems based on x86 architecture domi-
nated the HPC market, with GPUs and many-core systems re-
cently making their inroads. But this period is coming to an end,
and it is again time for a differentiated complementary HPC-
targeted system. NEC will develop such a system.
The Early Days
When did HPC really start? Probably one would think about the
time when Seymour Cray developed high-end machines at CDC
and then started his own company Cray Research, with the first
productbeingtheCray-1in1976.Atthattimesupercomputingwas
very special, systems were expensive, and access was extremely
limited for researchers and students. But the business case
was obvious in many fields, scientifically as well as commercially.
These Cray machines were vector-computers, initially featuring
only one CPU, as parallelization was far from being mainstream,
even hardly a matter of research. And code development was
easier than today, a code of 20000 lines was already considered
“complex” at that time, and there were not so many standard
codes and algorithms available anyway. For this reason code-de-
velopment could adapt to any kind of architecture to achieve op-
timal performance.
In 1981 NEC started marketing its first vector supercomputer, the
SX-2, a single-CPU-vector-architecture like the Cray-1. The SX-2
was a milestone, it was the first CPU to exceed one GFlops of peak
performance.
The Attack of the “Killer-Micros”
In the 1990s the situation changed drastically, and what we see
today is a direct consequence. The “killer-micros” took over, new
parallel, later “massively parallel” systems (MPP) were built based
on micro-processor architectures which were predominantly
Vector Supercomputing
The NEC SX Architecture
21
targeted to commercial sectors, or in the case of Intel, AMD or
Motorola to PCs, to products which were successfully invading
the daily life, partially more than even experts had anticipated.
Clearly one could save a lot of cost by using, in the worst case
slightly adapting these CPUs for high-end-systems, and by fo-
cusing investments on other ingredients, like interconnect tech-
nology. At the same time increasingly Linux was adopted and of-
fered a cheap generally available OS platform to allow everybody
to use HPC for a reasonable price.
Thesesystemswerecheaperbecauseoftworeasons:thedevelop-
ment of standard CPUs was sponsored by different businesses,
and another expensive part, the memory architecture with many
CPUsaddressingthesameflatmemoryspace,wasreplacedbysys-
tems comprised of many distinct so-called nodes, complete sys-
tems by themselves, plus some kind of interconnect technology,
which could also be taken from standard equipment partially.
The impact on the user was the necessity to code for these “dis-
tributed memory systems”. Standards for coding evolved, PVM
and later MPI, plus some proprietary paradigms. MPI is dominat-
ing still today.
In the high-end the special systems were still dominating, be-
cause they had architectural advantages and the codes were run-
ning with very good efficiency on such systems. Most codes were
not parallel at all or at least not well parallelized, let alone on
systems with distributed memory. The skills and experiences to
generate parallel versions of codes were lacking in those compa-
nies selling high-end-machines and also on the customers’ side.
So it took some time for the MPPs to take over. But it was inevi-
table, because this situation, sometimes called “democratization
of HPC”, was positive for both the commercial and the scientific
developments. From that time on researchers and students in
academia and industry got sufficient access to resources they
could not have dreamt of before.
The x86 Dominance
One could dispute whether the x86 architecture is the most
advanced or elegant, undoubtedly it became the dominant
one. It was developed for mass-markets, this was the strate-
gy, and HPC was a side-effect, sponsored by the mass-market.
This worked successfully and the proportion of x86 Linux clus-
ters in the Top500 list increased year by year. Because of the
mass-markets it made sense economically to invest a lot into
the development of LSI technology (large-scale integration).
The related progress, which led to more and more transistors
on the chips and increasing frequencies, helped HPC applica-
tions, which got more, faster and cheaper systems with each
generation. Software did not have to adapt a lot between gen-
erations and the software developments rather focused initial-
ly on the parallelization for distributed memory, later on new
functionalities and features. This worked for quite some years.
Systems of sufficient size to produce relevant results for typical
applications are cheap, and most codes do not really scale to a
level of parallelism to use thousands of processes. Some light-
house projects do, after a lot of development which is not easily
affordable, but the average researcher or student is often com-
pletely satisfied with something between a few and perhaps a
few hundred nodes of a standard Linux cluster.
In the meantime already a typical Intel-based dual-socket-node
of a standard cluster has 24 cores, soon a bit more. So paralleliza-
tion is increasingly important. There is “Moore’s law”, invented by
Gordon E. Moore, a co-founder of the Intel Corporation, in 1965.
Originallyitstatedthatthenumberoftransistorsinanintegrated
circuit doubles about every 18 months, leading to an exponential
increase. This statement was often misinterpreted that the per-
formance of a CPU would double every 18 months, which is mis-
leading. We observe ever increasing core-counts, the transistors
are just there, so why not? That applications might not be able to
22
use these many cores is another question, in principle the poten-
tial performance is there.
The “standard benchmark” in the HPC-scene is the so-called HPL,
High Performance LINPACK, and the famous Top500 list is built
based on this. This HPL benchmark is only one of three parts of
the original LINPACK benchmark, this is often forgotten! But the
other parts did not show such a good progress in performance,
their scalability is limited, so they could not serve to spawn com-
petition even between nations, which leads to a lot of prestige.
Moreover there are at most a few real codes in the world with
similar characteristics as the HPL. In other words: it does not tell
a lot. This is a highly political situation and probably quite some
money is wasted which would better be invested in brain-ware.
Anyway it is increasingly doubtful that LSI technology will ad-
vance at the same pace in the future. The grid-constant of Silicon
is 5.43 Å. LSI technology is now using 14 nm processes, the Intel
Broadwell generation, which is about 25 times the grid-constant,
and with 7nm LSIs at the long-term horizon. Some scientists as-
sume there is still some room to improve even more, but once
the features on the LSI have a thickness of only a few atoms one
cannot get another factor of ten. Sometimes the impact of some
future “disruptive technology” is claimed …
Another obstacle is the cost to build a chip fab for a new genera-
tion of LSI technology. Certainly there are ways to optimize both
production and cost, and there will be bright ideas, but with cur-
rent generations a fab costs at least many billion $. There have
been ideas of slowing down the innovation cycle on that side to
merely be able to recover the investment.
Memory Bandwidth
For many HPC applications it is not the compute power of the
Vector Supercomputing
The NEC SX Architecture
23
CPU which counts, it is the speed with which the CPU can
exchange data with the memory, the so-called memory band-
width. If it is neglected the CPUs will wait for data, the perfor-
mance will decrease. To describe the balance between CPU
performance and memory bandwidth one often uses the ratio
between operations and the amount of bytes that can be trans-
ferred while they are running, the “byte-per-flop-ratio”. Tradi-
tionally vector machines were always great in this regard, and
this is to continue as far as current plans are known. And this is
the reason why a vector machine can realize a higher fraction
of its theoretical performance on a real application than a sca-
lar machine.
Power Consumption
One other but really overwhelming problem is the increasing
power consumption of systems. Looking at different genera-
tions of Linux clusters one can assume that a dual-socket-node,
depending on how it is equipped with additional features, will
consume between 350 Watt and 400 Watt.
Thisonlyslightlyvariesbetweendifferentgenerations,perhapsit
will slightly increase. Even in Europe sites are talking about elec-
tricity budgets in excess of 1 MWatt, even beyond 10 MWatts. One
also has to cool the system, which adds something like 10-30%
to the electricity cost! The cost to run such a system over 5 years
will be in the same order of magnitude as the purchase price!
The power consumption will grow with the frequency of the CPU,
is the dominant reason why frequencies are stagnating, if not de-
creasing between product generations. There are more cores, but
individual cores of an architecture are getting slower, not faster!
This has to be compensated on the application side by increased
scalability, which is not always easy to achieve. In the past us-
ers easily got a higher performance for almost every application
with each new generation, this is no longer the case.
Architecture Development
SothebasicfactsclearlyindicatelimitationsoftheLSItechnology
and they are known since quite some time. Consequently this
leads to attempts to get additional performance by other means.
And the developments in the market clearly indicate that high-
end users are willing to consider the adaptation of codes.
To utilize available transistors CPUs with more complex func-
tionality are designed, utilizing SIMD instructions or providing
more operations per cycle than just an add and a multiply. SIMD
is “Single Instruction Multiple Data”. Initially at Intel the SIMD-in-
structions provided two results in 64-bit-precision per cycle rath-
er than one, in the meantime with AVX it is even four, on Xeon Phi
already eight.
This makes a lot of sense. The electricity consumed by the “ad-
ministration” of the operations, code fetch and decode, configu-
ration of paths on the CPU etc. is a significant portion of the pow-
er consumption. If the energy is used not only for one operation
but for two or even four, the energy per operation is obviously
reduced.
In essence and in order to overcome the performance-bottleneck
the architectures developed into the direction which to a larger
extent was already realized in the vector-supercomputers dec-
ades ago!
There is almost no architecture left without such features. But
SIMD-operations require application code to be vectorized. Be-
fore code developments and programming paradigms did not
care for fine-grain data-parallelism, which is the basis of appli-
cability of SIMD. Thus vector-machines could not show their
strength,onscalarcodetheycannotbeatascalarCPU.Butstrictly
speaking, there is no scalar CPU left, standard x86-systems do not
achieve their optimal performance on purely scalar code as well!
24
Alternative Architectures?
During recent years GPUs and many-core architectures are in-
creasingly used in the HPC market. Although these systems are
somewhat diverse in detail, in any case the degree of parallelism
needed is much higher than in standard CPUs. This points into
the same direction, a low-level parallelism is used to address the
bottlenecks. In the case of Nvidia GPUs the individual so-called
CUDA cores, relatively simple entities, are steered by a more
complex central processor, realizing a high level of SIMD paral-
lelism. In the case of Xeon Phi the individual cores are also less
powerful, but they are used in parallel using coding paradigms
which operate on the level of loop nests, an approach which was
already used on vector supercomputers during the 1980s!
So these alternative architectures not only go into the same
direction, the utilization of fine-grain parallelism, they also in-
dicate that coding styles which suite real vector machines will
dominate the future software developments. Note that at last
some of these systems have an economic backing by mass mar-
kets, gaming!
Vector Supercomputing
The NEC SX Architecture
S
EW
SE
SW tech BOX
25
History of the SX Series
As already stated NEC started marketing the first commercially
available generation of this series, the SX-2, in 1981. By now the
latest generation is the SX-ACE.
In 1998 Prof. Tadashi Watanabe, the developer of the SX series
and the major driver of NEC’s HPC-business, received the Eck-
hard-Mauchly-medal, a renowned US-award for these achieve-
ments,andin2006alsothe“SeymourCrayAward”.Prof.Watanabe
was a very pragmatic developer, a knowledgeable scientist and a
visionary manager, all at the same time, and a great person to
talk to. From the beginning the SX-architecture was different
from the competing Cray-vectors, it already integrated some of
the features that RISC CPUs at that time were using, but com-
bined them with vector registers and pipelines.
For example the SX could reorder instructions in hardware on
the fly, which Cray machines could not do. In effect this made
the system much easier to use. The single processor SX-2 was the
fastest CPU at that time, the first one to break the 1 GFlops bar-
rier. Naturally the compiler technology for automatic vectoriza-
tion and optimization was important.
With the SX-3 NEC developed a high-end shared-memory system.
It had up to four very strong CPUs. The resolution of memory ac-
cess conflicts by CPUs working in parallel, and the consistency of
data throughout the whole shared-memory system are non-triv-
ial, and the development of such a memory architecture was a
major breakthrough.
WiththeSX-4NECimplementedahigh-endsystembasedonCMOS.
ItwasthedirectcompetitortotheCrayT90,withahugeadvantage
with regard to price and power consumption. The T90 was using
ECL technology and achieved 1.8 GFlops peak performance. The
SX-4 had a lower frequency because of CMOS, but because of the
higher density of the LSIs it featured a higher level of parallelism
on the CPU, leading to 2 GFlops per CPU with up to 32 CPUs.
The memory architecture to support those was a major leap
ahead. On this system one could typically reach an efficiency
of 20-40% of peak performance measured throughout complete
applications, far from what is achieved on standard components
today. With this system NEC also implemented the first genera-
tion of its own proprietary interconnect, the so-called IXS, which
26
allowed connecting several of these strong nodes into one highly
powerful system. The IXS featured advanced functions like a
global barrier and some hardware fault-tolerance, things which
other companies tackled years later.
With the SX-5 this direction was continued, it was a very suc-
cessful system pushing the limits of CMOS-based CPUs and mul-
ti-node configurations. At that time slowly the efforts on the us-
ers’ side to parallelize their codes using “message passing”, PVM
and MPI at that time, started to surface, so the need for a huge
shared memory system slowly disappeared. Still there were quite
some important codes in the scene which were only parallel us-
ing a shared-memory paradigm, initially “microtasking”, which
was vendor-specific, “autotasking”, NEC and some other vendors
could automatically parallelize code, and later the standard
OpenMP. So there was still sufficient need for a huge a highly ca-
pable shared-memory-system.
But later it became clear that more and more codes could utilize
distributed memory systems, and moreover the memory-archi-
tecture of huge shared memory systems became increasingly
complex and that way costly. A memory is made of “banks”, in-
dividual pieces of memory which are used in parallel. In order
to provide the data at the necessary speed for a CPU a certain
minimum amount of banks per CPU is required, because a bank
needs some cycles to recover after every access. Consequently
the number of banks had to scale with the number of CPUs, so
that the complexity of the network connecting CPUs and banks
was growing with the square of the number of CPUs. And if the
CPUs became stronger they needed more banks, which made it
even worse. So this concept became too expensive.
Vector Supercomputing
History of the SX Series
27
At the same time NEC had a lot of experience in mixing MPI- and
OpenMP parallelization in applications, and it turned out that for
the very fast CPUs the shared-memory parallelization was typi-
cally not really efficient for more than four or eight threads. So
the concept was adapted accordingly, which lead to the SX-6 and
the famous Earth-Simulator, which kept the #1-spot of the Top500
for some years.
The Earth Simulator was a different machine, but based
on the same technology level and architecture as the SX-6.
Thesesystemswerethefirstimplementationeverofavector-CPU
on one single chip. The CPU by itself was not the challenge, but
the interface to the memory to support the bandwidth needed
for this chip, basically the “pins”. This kind of packaging tech-
no-logy was always a strong point in NEC’s technology portfolio.
TheSX-7wasasomewhatspecialmachine,itwasbasedontheSX-6
technology level, but featured 32 CPUs on one shared memory.
The SX-8 then was a new implementation again with some ad-
vanced features which directly addressed the necessities of cer-
tain code-structures frequently encountered in applications. The
communication between the world-wide organization of appli-
cation-analysts and the hard- and software developers in Japan
hadprovenveryfruitful,theSX-8wasverysuccessfulinthemarket.
With the SX-9 the number of CPUs per node was pushed up again
to 16 because of certain demands in the market. Just as an in-
dication for the complexity of the memory architecture: the
system had 16384 banks! The peak-performance of one node was
16 TFlops.
Keep in mind that the efficiency achieved on real applications
was still in the range of 10% in bad cases, up to 20% or even more,
depending on the code. One should not compare this peak perfor-
mance with some standard chip today, this would be misleading.
The Current Product: SX-ACE
In November 2014 NEC announced the next generation of the
SX-vector-product, the SX-ACE. Some systems are already in-
stalled and accepted, also in Europe.
The main design target was the reduction of power-consump-
tion per sustained performance, and measurements have prov-
en this has been achieved. In order to do so and in line with the
observation that the shared-memory parallelization with vector
CPUs will be useful on four cores, perhaps eight, but normally
not beyond that, the individual node consists of a single-chip-
four core-processor, which inherits memory-controllers and the
interface to the communication interconnect, a next generation
of the proprietary IXS.
That way a node is a small board, leading to a very compact de-
sign. The complicated network between the memory and the
CPUs, which took about 70% of the power-consumption of an SX-
9, was eliminated. And naturally a new generation of LSI-and PCB
technology also leads to reductions.
With the memory bandwidth of 256 GB/s this machine realizes
one “byte per flop”, a measure of the balance between the pure
compute power and the memory bandwidth of a chip. In compar-
ison nowadays typical standard processors have in the order of
one eights of this!
28
Memory bandwidth is increasingly a problem, this is known in
the market as the “memory wall”, and is a hot topic since more
than a decade. Standard components are built for a different
market, where memory bandwidth is not as important as it is for
HPC applications. Two of these node boards are put into a mod-
ule, eight such modules are combined in one cage, and four of
the cages, i.e. 64 nodes, are contained in one rack. The standard
configuration of 512 nodes consists of eight racks.
Vector Supercomputing
History of the SX Series
29
NEC already installed several SX-ACE systems in Europe, and
more installations are ongoing. The advantages of the system,
the superior memory bandwidth, the strong CPU with good effi-
ciency, will provide the scientists with performance levels on cer-
tain codes which otherwise could not be achieved. Moreover the
system has proven to be superior in terms of power consumption
per sustained application performance, and this is a real hot top-
ic nowadays.
S
EW
SE
SW
tech BOX
do i = 1, n
a(i) = b(i) + c(i)
end do
30
Vector Supercomputing
History of the SX Series
LSI technologies have improved drastically during the last
two decades, but the ever increasing power consumption of
systems now poses serious limitations. Tasks like fetching in-
structions, decoding them or configuring paths on the chip
consume energy, so it makes sense to execute these activities
not for one operation only, but for a set of such operations.
This is meant by SIMD.
Data Parallelism
The underlying structure to enable the usage of SIMD-instruc-
tions is called “data parallelism”. For example the following loop
is data-parallel, the individual loop iterations could be executed
in any arbitrary order, and the results would stay the same:
The complete independence of the loop iterations allows executing
theminparallel,andthisismappedtohardwarewithSIMDsupport.
Naturally one could also use other means to parallelize such a
loop, e.g. for a very long loop one could execute chunks of loop
iterations on different processors of a shared-memory machine
using OpenMP. But this will cause overhead, so it will not be effi-
cient for small trip counts, while a SIMD feature in hardware can
deal with such a fine granularity.
As opposed to the example above the following loop, a recursion,
is not data parallel:
do i = 1, n-1
a(i+1) = a(i) + c(i)
end do
31
The time or number of cycles extends to the right hand side,
assuming five stages (vertical direction), the red squares indi-
cate which stage is busy at the given cycle. Obviously this is not
efficient, because white squares indicate transistors which con-
sume power without actively producing results. And the com-
pute-performance is not optimal, it takes five cycles to compute
one result.
Similarly to the assembly line in the automotive industry this is
what we want to achieve:
After the “latency”, five cycles, just the length of the pipeline, we
get a result every cycle. Obviously it is easy to feed the pipeline
with sufficient independent input data in the case of different
loop iterations of a data-parallel loop.
If this is not possible one could also use different operations
within the execution of a complicated loop body within one
Obviously a change of the order of the loop iterations, for ex-
ample the special case of starting with a loop-index of n-1 and
decreasing it afterwards, will lead to different results. Therefore
these iterations cannot be executed in parallel.
Pipelining
The following description is very much simplified, but explains
the basic principles: Operations like add and multiply consist of
so-called segments, which are executed one after the other in a
so-called pipeline. While each segment acts on the data for some
kind of intermediate step, only after the execution of the whole
pipeline the final result is available.
It is similar to an assembly-line in the automotive industry. Now,
the segments do not need to deal with the same operation at a
given point in time, they could be kept active if sufficient inputs
are fed into the pipeline. Again, in the automotive industry the
different stages of the assembly line are working on different
cars, and nobody would wait for a car to be completed before
starting the next one in the first stage of the production, that
way keeping almost all stages inactive. Such a situation would
be resembled by the following picture:
32
given loop iteration. But typically operations in such a loop body
need results from previous operations of the same iteration, so
one would rather arrive at a utilization of the pipeline like this:
Some operations can be calculated almost in parallel, others
have to wait for previous calculations to finish because they
need input. In total the utilization of the pipeline is better but
not optimal.
This was the situation before the trend to SIMD computing was
revitalized. Before that the advances of LSI technology always
provided an increased performance with every new generation
of product. But then the race for increasing frequency stopped,
and now architectural enhancements are important again.
Single Instruction Multiple Data (SIMD)
Hardware realizations of SIMD have been present since the first
Cray-1, but this was a vector architecture. Nowadays the instruc-
tion sets SSE and AVX in their different generations and flavors
use it, and similarly for example CUDA coding is also a way to uti-
lize data parallelism.
The “width” of the SIMD-instruction, the amount of loop itera-
tions to execute in parallel by one instruction, is kept relatively
small in the case of SSE (two for double precision) and AVX (four
for double precision). The implementations of SIMD in stand-
ard “scalar” processors typically involve add-and-multiply units
which can produce several results per clock-cycle. They also fea-
ture registers of the same “width” to keep several input parame-
ters or several results.
Vector Supercomputing
History of the SX Series
33
iterations are independent, means if the loop is data parallel. It
would not work if one computation would need a result of anoth-
er computation as input while this is just being processed.
In fact the SX-vector-machines always were combining both
ideas, the vector-register and parallel functional units, so-called
“pipe-sets”, four in this picture:
There is another advantage of the vector architecture in the SX,
and this proves to be very important. The scalar portion of the
CPU is acting independently, and while the pipelines are kept
busy with input from the vector registers, the inevitable scalar
instructions which come with every loop can be executed. This
involves checking of loop counters, updating of address pointers
and the like. That way these instructions essentially do not con-
sume time. The amount of cycles available to “hide” those is the
length of the vector-registers divided by the number of pipe sets.
In fact the white space in the picture above can partially be hid-
den as well, because new vector instructions can be issued in ad-
vance to immediately take over once the vector register with the
input data for the previous instruction is completely processed.
The instruction-issue-unit has the complete status information
of all available resources, specifically pipes and vector registers.
In a way the pipelines are “doubled” for SSE, but both are activat-
ed by the same single instruction:
Vector Computing
The difference between those SIMD architectures and SIMD with
a real vector architecture like the NEC SX-ACE, the only remaining
vector system, lies in the existence of vector registers.
In order to provide sufficient input to the functional units the
vector registers have proven to be very efficient. In a way they are
a very natural match to the idea of a pipeline – or assembly line.
Again thinking about the automotive industry, there one would
have a stock of components to constantly feed the assembly line.
The following viewgraph resembles this situation:
When an instruction is issued it takes some cycles for the first
result to appear, but after that one would get a new result every
cycle, until the vector register, which has a certain length, is
completely processed. Naturally this can only work if the loop
High Performance Computing (HPC) clusters serve a crucial role at research-intensive
organizations in industries such as aerospace, meteorology, pharmaceuticals, and oil
and gas exploration.
The thing they have in common is a requirement for large-scale computations. Bright’s
solution for HPC puts the power of supercomputing within the reach of just about any
organization.
Advanced Cluster
Management Made Easy
EngineeringLifeSciencesAutomotivePriceModellingAerospaceCAEDataAnalytics
36
The Bright Advantage
Bright Cluster Manager for HPC makes it easy to build and oper-
ate HPC clusters using the server and networking equipment of
your choice, tying them together into a comprehensive, easy to
manage solution.
Bright Cluster Manager for HPC lets customers deploy complete
clusters over bare metal and manage them effectively. It pro-
vides single-pane-of-glass management for the hardware, the
operating system, HPC software, and users. With Bright Cluster
Manager for HPC, system administrators can get clusters up and
running quickly and keep them running reliably throughout their
life cycle – all with the ease and elegance of a fully featured, en-
terprise-grade cluster manager.
Bright Cluster Manager was designed by a group of highly expe-
rienced cluster specialists who perceived the need for a funda-
mental approach to the technical challenges posed by cluster
management. The result is a scalable and flexible architecture,
forming the basis for a cluster management solution that is ex-
tremely easy to install and use, yet is suitable for the largest and
most complex clusters.
Scale clusters to thousands of nodes
Bright Cluster Manager was designed to scale to thousands of
nodes. It is not dependent on third-party (open source) software
that was not designed for this level of scalability. Many advanced
features for handling scalability and complexity are included:
Management Daemon with Low CPU Overhead
The cluster management daemon (CMDaemon) runs on every
node in the cluster, yet has a very low CPU load – it will not cause
any noticeable slow-down of the applications running on your
cluster.
Because the CMDaemon provides all cluster management func-
tionality, the only additional daemon required is the workload
manager (or queuing system) daemon. For example, no daemons
Advanced Cluster
Management Made Easy
Easy-to-use, complete and scalable
The cluster installer takes you through the installation process and
offers advanced options such as “Express” and “Remote”.
37
for monitoring tools such as Ganglia or Nagios are required. This
minimizes the overall load of cluster management software on
your cluster.
Multiple Load-Balancing Provisioning Nodes
Node provisioning – the distribution of software from the head
node to the regular nodes – can be a significant bottleneck in
large clusters. With Bright Cluster Manager, provisioning capabil-
ity can be off-loaded to regular nodes, ensuring scalability to a
virtually unlimited number of nodes.
When a node requests its software image from the head node,
the head node checks which of the nodes with provisioning ca-
pability has the lowest load and instructs the node to download
or update its image from that provisioning node.
Synchronized Cluster Management Daemons
The CMDaemons are synchronized in time to execute tasks in
exact unison. This minimizes the effect of operating system jit-
ter (OS jitter) on parallel applications. OS jitter is the “noise” or
“jitter” caused by daemon processes and asynchronous events
such as interrupts. OS jitter cannot be totally prevented, but for
parallel applications it is important that any OS jitter occurs at
the same moment in time on every compute node.
Built-In Redundancy
Bright Cluster Manager supports redundant head nodes and re-
dundant provisioning nodes that can take over from each other
in case of failure. Both automatic and manual failover configura-
tions are supported.
Other redundancy features include support for hardware and
software RAID on all types of nodes in the cluster.
Diskless Nodes
Bright Cluster Manager supports diskless nodes, which can be
useful to increase overall Mean Time Between Failure (MTBF),
particularly on large clusters. Nodes with only InfiniBand and no
Ethernet connection are possible too.
Bright Cluster Manager 7 for HPC
For organizations that need to easily deploy and manage HPC
clusters Bright Cluster Manager for HPC lets customers deploy
complete clusters over bare metal and manage them effectively.
It provides single-pane-of-glass management for the hardware,
the operating system, the HPC software, and users.
With Bright Cluster Manager for HPC, system administrators can
quickly get clusters up and running and keep them running relia-
bly throughout their life cycle – all with the ease and elegance of
a fully featured, enterprise-grade cluster manager.
Features
 Simple Deployment Process – Just answer a few questions
about your cluster and Bright takes care of the rest. It installs
all of the software you need and creates the necessary config-
uration files for you, so you can relax and get your new cluster
up and running right – first time, every time.
S
EW
SE
SW
tech BOX
38
Advanced Cluster
Management Made Easy
Easy-to-use, complete and scalable Head Nodes  Regular Nodes
A cluster can have different types of nodes, but it will always
have at least one head node and one or more “regular” nodes.
A regular node is a node which is controlled by the head node
and which receives its software image from the head node or
from a dedicated provisioning node.
Regular nodes come in different flavors.
Some examples include:
 Failover Node – A node that can take over all functionalities
of the head node when the head node becomes unavailable.
 Compute Node – A node which is primarily used for compu-
tations.
 Login Node – A node which provides login access to users of
the cluster. It is often also used for compilation and job sub-
mission.
 I/O Node – A node which provides access to disk storage.
 Provisioning Node – A node from which other regular nodes
can download their software image.
 Workload Management Node – A node that runs the central
workload manager, also known as queuing system.
 Subnet Management Node – A node that runs an InfiniBand
subnet manager.
 More types of nodes can easily be defined and configured
with Bright Cluster Manager. In simple clusters there is only
one type of regular node – the compute node – and the head
node fulfills all cluster management roles that are described
in the regular node types mentioned above.
39
All types of nodes (head nodes and regular nodes) run the same
CMDaemon. However, depending on the role that has been as-
signed to the node, the CMDaemon fulfills different tasks. This
node has a ‘Login Role’, ‘Storage Role’ and ‘PBS Client Role’.
The architecture of Bright Cluster Manager is implemented by
the following key elements:
 The Cluster Management Daemon (CMDaemon)
 The Cluster Management Shell (CMSH)
 The Cluster Management GUI (CMGUI)
Connection Diagram
A Bright cluster is managed by the CMDaemon on the head node,
which communicates with the CMDaemons on the other nodes
over encrypted connections.
The CMDaemon on the head node exposes an API which can also
be accessed from outside the cluster. This is used by the cluster
management Shell and the cluster management GUI, but it can
also be used by other applications if they are programmed to ac-
cess the API. The API documentation is available on request.
The diagram below shows a cluster with one head node, three
regular nodes and the CMDaemon running on each node.
Elements of the Architecture
40
 Can Install on Bare Metal – With Bright Cluster Manager there
is nothing to pre-install. You can start with bare metal serv-
ers, and we will install and configure everything you need. Go
from pallet to production in less time than ever.
 Comprehensive Metrics and Monitoring – Bright Cluster Man-
ager really shines when it comes to showing you what’s going
on in your cluster. It’s beautiful graphical user interface lets
you monitor resources and services across the entire cluster
in real time.
 Powerful User Interfaces – With Bright Cluster Manager you
don’t just get one great user interface, we give you two. That
way, you can choose to provision, monitor, and manage your
HPC clusters with a traditional command line interface or our
beautiful, intuitive graphical user interface.
 OptimizetheUseofITResourcesbyHPCApplications–Bright
ensures HPC applications get the resources they require
according to the policies of your organization. Your cluster
will prioritize workloads according to your business priorities.
 HPCToolsandLibrariesIncluded–EverycopyofBrightCluster
Manager comes with a complete set of HPC tools and libraries
so you will be ready to develop, debug, and deploy your HPC
code right away.
Advanced Cluster
Management Made Easy
Easy-to-use, complete and scalable
“The building blocks for transtec HPC solutions must be chosen
according to our goals ease-of-management and ease-of-use. With
Bright Cluster Manager, we are happy to have the technology leader
at hand, meeting these requirements, and our customers value that.”
Jörg Walter
HPC Solution Engineer
Monitor your entire cluster at an glance with Bright Cluster Manager
41
Quickly build and deploy an HPC cluster
When you need to get an HPC cluster up and running, don’t waste
time cobbling together a solution from various open source
tools. Bright’s solution comes with everything you need to set up
a complete cluster from bare metal and manage it with a beauti-
ful, powerful user interface.
Whether your cluster is on-premise or in the cloud, Bright is a
virtual one-stop-shop for your HPC project.
Build a cluster in the cloud
Bright’s dynamic cloud provisioning capability lets you build an
entire cluster in the cloud or expand your physical cluster into
the cloud for extra capacity. Bright allocates the compute re-
sources in Amazon Web Services automatically, and on demand.
Build a hybrid HPC/Hadoop cluster
Does your organization need HPC clusters for technical comput-
ing and Hadoop clusters for Big Data? The Bright Cluster Man-
ager for Apache Hadoop add-on enables you to easily build and
manage both types of clusters from a single pane of glass. Your
system administrators can monitor all cluster operations, man-
age users, and repurpose servers with ease.
Driven by research
Bright has been building enterprise-grade cluster management
software for more than a decade. Our solutions are deployed in
thousands of locations around the globe. Why reinvent the
wheel when you can rely on our experts to help you implement
the ideal cluster management solution for your organization.
Clusters in the cloud
Bright Cluster Manager can provision and manage clusters that
are running on virtual servers inside the AWS cloud as if they
were local machines. Use this feature to build an entire cluster
in AWS from scratch or extend a physical cluster into the cloud
when you need extra capacity.
Use native services for storage in AWS – Bright Cluster Manager
lets you take advantage of inexpensive, secure, durable, flexible
and simple storage services for data use, archiving, and backup
in the AWS cloud.
Make smart use of AWS for HPC – Bright Cluster Manager can
save you money by instantiating compute resources in AWS only
when they are needed. It uses built-in intelligence to create in-
stances only after the data is ready for processing and the back-
log in on-site workloads requires it.
You decide which metrics to monitor. Just drag a component into the display
area an Bright Cluster Manager instantly creates a graph of the data you need.
BOX
42
Advanced Cluster
Management Made Easy
Easy-to-use, complete and scalable High performance meets efficiency
Initially, massively parallel systems constitute a challenge to
both administrators and users. They are complex beasts. Anyone
building HPC clusters will need to tame the beast, master the
complexity and present users and administrators with an easy-
to-use, easy-to-manage system landscape.
Leading HPC solution providers such as transtec achieve this
goal. They hide the complexity of HPC under the hood and match
high performance with efficiency and ease-of-use for both users
and administrators. The “P” in “HPC” gains a double meaning:
“Performance” plus “Productivity”.
Cluster and workload management software like Moab HPC
Suite, Bright Cluster Manager or Intel Fabric Suite provide the
means to master and hide the inherent complexity of HPC sys-
tems. For administrators and users, HPC clusters are presented
as single, large machines, with many different tuning parame-
ters. The software also provides a unified view of existing clus-
ters whenever unified management is added as a requirement by
the customer at any point in time after the first installation. Thus,
daily routine tasks such as job management, user management,
queue partitioning and management, can be performed easily
with either graphical or web-based tools, without any advanced
scripting skills or technical expertise required from the adminis-
trator or user.
S
EW
SE
SW tech BOX
43
Bright Cluster Manager unleashes and manages
the unlimited power of the cloud
Create a complete cluster in Amazon EC2, or easily extend your
onsite cluster into the cloud. Bright Cluster Manager provides a
“single pane of glass” to both your on-premise and cloud resourc-
es, enabling you to dynamically add capacity and manage cloud
nodes as part of your onsite cluster.
Cloud utilization can be achieved in just a few mouse clicks, with-
out the need for expert knowledge of Linux or cloud computing.
With Bright Cluster Manager, every cluster is cloud-ready, at no
extra cost. The same powerful cluster provisioning, monitoring,
scheduling and management capabilities that Bright Cluster
Manager provides to onsite clusters extend into the cloud.
The Bright advantage for cloud utilization
 Ease of use: Intuitive GUI virtually eliminates user learning
curve; no need to understand Linux or EC2 to manage system.
Alternatively, cluster management shell provides powerful
scripting capabilities to automate tasks.
 Complete management solution: Installation/initialization,
provisioning, monitoring, scheduling and management in
one integrated environment.
 Integrated workload management: Wide selection of work-
load managers included and automatically configured with
local, cloud and mixed queues.
 Single pane of glass; complete visibility and control: Cloud
computenodesmanagedaselementsoftheon-sitecluster;vi-
sible from a single console with drill-downs and historic data.
 Efficient data management via data-aware scheduling: Auto-
matically ensures data is in position at start of computation;
delivers results back when complete.
 Secure, automatic gateway: Mirrored LDAP and DNS services
inside Amazon VPC, connecting local and cloud-based nodes
for secure communication over VPN.
 Cost savings: More efficient use of cloud resources; support
for spot instances, minimal user intervention.
Two cloud scenarios
Bright Cluster Manager supports two cloud utilization scenarios:
“Cluster-on-Demand” and “Cluster Extension”.
Scenario 1: Cluster-on-Demand
The Cluster-on-Demand scenario is ideal if you do not have a clus-
ter onsite, or need to set up a totally separate cluster. With just a
few mouse clicks you can instantly create a complete cluster in
the public cloud, for any duration of time.
44
Advanced Cluster
Management Made Easy
Cloud Bursting With Bright
Scenario 2: Cluster Extension
The Cluster Extension scenario is ideal if you have a cluster onsite
but you need more compute power, including GPUs. With Bright
Cluster Manager, you can instantly add EC2-based resources to
your onsite cluster, for any duration of time. Extending into the
cloud is as easy as adding nodes to an onsite cluster – there are
only a few additional, one-time steps after providing the public
cloud account information to Bright Cluster Manager.
The Bright approach to managing and monitoring a cluster in
the cloud provides complete uniformity, as cloud nodes are
managed and monitored the same way as local nodes:
 Load-balanced provisioning
 Software image management
 Integrated workload management
 Interfaces – GUI, shell, user portal
 Monitoring and health checking
 Compilers, debuggers, MPI libraries, mathematical libraries
and environment modules
Bright Computing
45
Bright Cluster Manager also provides additional features that are
unique to the cloud.
Amazon spot instance support – Bright Cluster Manager enables
users to take advantage of the cost savings offered by Amazon’s
spot instances. Users can specify the use of spot instances, and
Bright will automatically schedule as available, reducing the cost
to compute without the need to monitor spot prices and sched-
ule manually.
Hardware Virtual Machine (HVM) – Bright Cluster Manager auto-
matically initializes all Amazon instance types, including Cluster
Compute and Cluster GPU instances that rely on HVM virtualization.
Amazon VPC – Bright supports Amazon VPC setups which allows
compute nodes in EC2 to be placed in an isolated network, there-
by separating them from the outside world. It is even possible to
route part of a local corporate IP network to a VPC subnet in EC2,
so that local nodes and nodes in EC2 can communicate easily.
Data-aware scheduling – Data-aware scheduling ensures that
input data is automatically transferred to the cloud and made
accessible just prior to the job starting, and that the output data
is automatically transferred back. There is no need to wait for the
data to load prior to submitting jobs (delaying entry into the job
queue), nor any risk of starting the job before the data transfer is
complete (crashing the job). Users submit their jobs, and Bright’s
data-aware scheduling does the rest.
Bright Cluster Manager provides the choice:
“To Cloud, or Not to Cloud”
Not all workloads are suitable for the cloud. The ideal situation
for most organizations is to have the ability to choose between
onsite clusters for jobs that require low latency communication,
complex I/O or sensitive data; and cloud clusters for many other
types of jobs.
Bright Cluster Manager delivers the best of both worlds: a pow-
erful management solution for local and cloud clusters, with
the ability to easily extend local clusters into the cloud without
compromising provisioning, monitoring or managing the cloud
resources.
One or more cloud nodes can be configured under the Cloud Settings tab.
While all HPC systems face challenges in workload demand, workflow constraints,
resource complexity, and system scale, enterprise HPC systems face more stringent
challenges and expectations.
Enterprise HPC systems must meet mission-critical and priority HPC workload
demands for commercial businesses, business-oriented research, and academic
organizations
Intelligent HPC
Workload Management
47 HighTrouhputComputingCADBigDataAnalyticsSimulationAerospaceAutomotive
48
Solutions to Enterprise
High-Performance Computing Challenges
The Enterprise has requirements for dynamic scheduling, provi-
sioning and management of multi-step/multi-application servic-
es across HPC, cloud, and big data environments. Their workloads
directly impact revenue, product delivery, and organizational ob-
jectives.
Enterprise HPC systems must process intensive simulation and
data analysis more rapidly, accurately and cost-effectively to
accelerate insights. By improving workflow and eliminating job
delays and failures, the business can more efficiently leverage
data to make data-driven decisions and achieve a competitive
advantage. The Enterprise is also seeking to improve resource
utilization and manage efficiency across multiple heterogene-
ous systems. User productivity must increase to speed the time
to discovery, making it easier to access and use HPC resources
and expand to other resources as workloads demand.
Enterprise-Ready HPC Workload Management
Moab HPC Suite – Enterprise Edition accelerates insights by uni-
fying data center resources, optimizing the analysis process,
and guaranteeing services to the business. Moab 8.1 for HPC
systems and HPC cloud continually meets enterprise priorities
through increased productivity, automated workload uptime,
and consistent SLAs. It uses the battle-tested and patented Moab
intelligence engine to automatically balance the complex, mis-
sion-critical workload priorities of enterprise HPC systems. En-
terprise customers benefit from a single integrated product that
brings together key Enterprise HPC capabilities, implementation
services, and 24/7 support.
It is imperative to automate workload workflows to speed time
to discovery. The following new use cases play a key role in im-
proving overall system performance:
Intelligent HPC
Workload Management
Moab HPC Suite – Enterprise Edition
Moab is the “brain” of an HPC system, intelligently optimizing workload
throughput while balancing service levels and priorities.
S
EW
SE
SW tech BOX
49
Intelligent HPC Workload Management Across
Infrastructure and Organizational Complexity
Moab HPC Suite – Enterprise Edition acts as the “brain” of a high
performance computing (HPC) system to accelerate and auto-
mate complex decision making processes. The patented Moab in-
telligence engine provides multi-dimensional policies that mimic
real-world decision making.
These multi-dimensional, real-world policies are essential in
scheduling workload to optimize job speed, job success and re-
sourceutilization.MoabHPCSuite–EnterpriseEditionintegrates
decision-making data from and automates actions through your
system’s existing mix of resource managers.
This enables all the dimensions of real-time granular resource
attributes, resource state, as well as current and future resource
commitments to be modeled against workload/application re-
quirements. All of these dimensions are then factored into the
most efficient scheduling decisions. Moab also dramatically sim-
plifies the management tasks and processes across these com-
plex, heterogeneous environments. Moab works with many of
the major resource management and industry standard resource
monitoring tools and covers mixed hardware, network, storage
and licenses.
Moab HPC Suite – Enterprise Edition multi-dimensional policies
are also able to factor in organizational priorities and complex-
ities when scheduling workload. Moab ensures workload is
processed according to organizational priorities and commit-
ments and that resources are shared fairly across users, groups
and even multiple organizations. This enables organizations to
automaticallyenforceserviceguaranteesandeffectivelymanage
organizational complexities with simple policy-based settings.
Moab HPC Suite – Enterprise Edition is architected to integrate
on top of your existing resource managers and other types of
management tools in your environment to provide policy-based
scheduling and management of workloads. It makes the optimal
complex decisions based on all of the data it integrates from the
various resource managers and then orchestrates the job and
management actions through those resource managers.
Moab’s flexible management integration makes it the ideal
choice to integrate with existing or new systems as well as to
manage your high performance computing system as it grows
and expands in the future.
50
 Elastic Computing – unifies data center resources by assisting
the business management resource expansion through burst-
ing to private/public clouds and other data center resources
utilizing OpenStack as a common platform
 Performance and Scale – optimizes the analysis process by
dramatically increasing TORQUE and Moab throughput and
scalability across the board
 Tighter cooperation between Moab and Torque – harmoniz-
ing these key structures to reduce overhead and improve
communication between the HPC scheduler and the re-
source manager(s)
 More Parallelization – increasing the decoupling of Moab
and TORQUE’s network communication so Moab is less
dependent on TORQUE’S responsiveness, resulting in a 2x
speed improvement and shortening the duration of the av-
erage scheduling iteration in half
 Accounting Improvements – allowing toggling between
multiple modes of accounting with varying levels of en-
forcement, providing greater flexibility beyond strict allo-
cation options. Additionally, new up-to-date accounting
balances provide real-time insights into usage tracking.
 Viewpoint Admin Portal – guarantees services to the busi-
ness by allowing admins to simplify administrative reporting,
workload status tracking, and job resource viewing
Accelerate Insights
Moab HPC Suite – Enterprise Edition accelerates insights by in-
creasing overall system, user and administrator productivity,
achieving more accurate results that are delivered faster and at
a lower cost from HPC resources. Moab provides the scalability,
90-99 percent utilization, and simple job submission required to
maximize productivity. This ultimately speeds the time to discov-
ery, aiding the enterprise in achieving a competitive advantage
due to its HPC system. Enterprise use cases and capabilities in-
clude the following:
Intelligent HPC
Workload Management
Moab HPC Suite – Enterprise Edition
“With Moab HPC Suite, we can meet very demanding customers’
requirements as regards unified management of heterogeneous
cluster environments, grid management, and provide them with flex-
ible and powerful configuration and reporting options. Our custom-
ers value that highly.”
Thomas Gebert
HPC Solution Architect
51
 OpenStack integration to offer virtual and physical resource
provisioning for IaaS and PaaS
 Performance boost to achieve 3x improvement in overall opti-
mization performance
 Advanced workflow data staging to enable improved cluster
utilization, multiple transfer methods, and new transfer types
 Advanced power management with clock frequency control
and additional power state options, reducing energy costs by
15-30 percent
 Workload-optimized allocation policies and provisioning to
get more results out of existing heterogeneous resources and
reduce costs, including topology-based allocation
 Unify workload management across heterogeneous clusters
by managing them as one cluster, maximizing resource availa-
bility and administration efficiency
 Optimized, intelligent scheduling packs workloads and back-
fills around priority jobs and reservations while balancing
SLAs to efficiently use all available resources
 Optimized scheduling and management of accelerators, both
Intel Xeon Phi and GPGPUs, for jobs to maximize their utiliza-
tion and effectiveness
 Simplified job submission and management with advanced
job arrays and templates
 Showback or chargeback for pay-for-use so actual resource
usage is tracked with flexible chargeback rates and reporting
by user, department, cost center, or cluster
 Multi-cluster grid capabilities manage and share workload
across multiple remote clusters to meet growing workload
demand or surges
Guarantee Services to the Business
Job and resource failures in enterprise HPC systems lead to de-
layed results, missed organizational opportunities, and missed
objectives. Moab HPC Suite – Enterprise Edition intelligently
automates workload and resource uptime in the HPC system to
ensure that workloads complete successfully and reliably, avoid
failures, and guarantee services are delivered to the business.
52
The Enterprise benefits from these features:
 Intelligent resource placement prevents job failures with
granular resource modeling, meeting workload requirements
and avoiding at-risk resources
 Auto-response to failures and events with configurable ac-
tions to pre-failure conditions, amber alerts, or other metrics
and monitors
 Workload-aware future maintenance scheduling that helps
maintain a stable HPC system without disrupting workload
productivity
 Real-world expertise for fast time-to-value and system uptime
with included implementation, training and 24/7 support re-
mote services
Auto SLA Enforcement
Moab HPC Suite – Enterprise Edition uses the powerful Moab in-
telligence engine to optimally schedule and dynamically adjust
workload to consistently meet service level agreements (SLAs),
guarantees, or business priorities. This automatically ensures
that the right workloads are completed at the optimal times,
taking into account the complex number of using departments,
priorities, and SLAs to be balanced. Moab provides the following
benefits:
 Usage accounting and budget enforcement that schedules
resources and reports on usage in line with resource sharing
agreements and precise budgets (includes usage limits, usage
reports, auto budget management, and dynamic fair share
policies)
 SLA and priority policies that make sure the highest priority
workloads are processed first (includes Quality of Service and
hierarchical priority weighting)
 Continuous plus future scheduling that ensures priorities and
guarantees are proactively met as conditions and workload
changes (i.e. future reservations, pre-emption)
Intelligent HPC
Workload Management
Moab HPC Suite – Grid Option
53
Grid- and Cloud-Ready Workload Management
The benefits of a traditional HPC environment can be extended
to more efficiently manage and meet workload demand with
the multi-cluster grid and HPC cloud management capabilities in
Moab HPC Suite – Enterprise Edition. It allows you to:
 Self-service job submission portal enables users to easily sub-
mit and track HPC cloud jobs at any time from any location
with little training required
 Pay-for-use show back or chargeback so actual resource usage
is tracked with flexible chargeback rates reporting by user, de-
partment, cost center, or cluster
 On-demand workload-optimized node provisioning dynami-
cally provisions nodes based on policies, to better meet work-
load needs.
 Manage and share workload across multiple remote clusters
to meet growing workload demand or surges by adding on the
Moab HPC Suite – Grid Option.
S
EW
SE
SW
tech BOX
54
Choose the Moab HPC Suite That’s Right for You
This following chart provides a high-level comparison of the dif-
ference in key use cases and capabilities that are enabled be-
tween the different editions of Moab HPC Suite to intelligently
manage and optimize HPC workload efficiency. Customers can
use this chart to help determine which edition will best address
their HPC workload management challenges, maximize their
workload and resource productivity, and help them meet their
service level performance goals. To explore any use cases or ca-
pability compared in the chart further, see the HPC Workload
Management Use Cases.
 Basic Edition is best for static, non-dynamic basic clusters of
all sizes who want to optimize workload throughput and uti-
lization while balancing priorities
 Enterprise Edition is best for large clusters with dynamic,
mission-critical workloads and complex service levels and
priorities to balance. It is also for organizations who want to
integrate or who require HPC Cloud capabilities for their HPC
cluster systems. Enterprise Edition provides additional uses
cases these organizations need such as: Accounting for us-
age of cluster resources for usage budgeting and pay-for-use
showback or chargeback
 Workload optimized node provisioning
 Heterogeneous, multi-cluster workload management
 Workload-aware power management
Intelligent HPC
Workload Management
Optimizing Accelerators with
Moab HPC Suit
55
 Application Portal Edition is suited for customers who need an
application-centric job submission experience to simplify and
extend high performance computing to more users without re-
quiring specialized HPC training. It provides all the benefits of
the Enterprise Edition plus meeting these requirements with
a technical computing application portal that supports most
common applications in manufacturing, oil  gas, life sciences,
academic, government and other industries.
 Remote Visualization Edition is suited for customers with
many users, internally or external to their organization, of
3D/CAE/CAD and other visualization applications who want
to reduce the hardware, networking and management costs
of running their visualization workloads. This edition also
provides the added benefits of improved workforce produc-
tivity and collaboration while optimizing the utilization and
guaranteeing shared resource access for users moving from
technical workstations to more efficient shared resources in
a technical compute cloud. It provides all the benefits of the
Enterprise Edition plus providing these additional capabili-
ties and benefits.
BOX
56
transtec HPC solutions are designed for maximum flexibility
and ease of management. We not only offer our customers
the most powerful and flexible cluster management solution
out there, but also provide them with customized setup and
sitespecific configuration. Whether a customer needs a dynam-
ical Linux-Windows dual-boot solution, unifi ed management
of different clusters at different sites, or the fi ne-tuning of the
Moab scheduler for implementing fi ne-grained policy confi
guration – transtec not only gives you the framework at hand,
but also helps you adapt the system according to your special
needs. Needless to say, when customers are in need of special
trainings, transtec will be there to provide customers, adminis-
trators, or users with specially adjusted Educational Services.
Having a many years’ experience in High Performance Comput-
ing enabled us to develop effi cient concepts for installing and
deploying HPC clusters. For this, we leverage well-proven 3rd
party tools, which we assemble to a total solution and adapt
and confi gure according to the customer’s requirements.
We manage everything from the complete installation and
configuration of the operating system, necessary middleware
components like job and cluster management systems up to the
customer’s applications.
Intelligent HPC
Workload Management
Optimizing Accelerators with
Moab HPC Suit
57
Moab HPC Suite Use Case Basic Enterprise Application Portal Remote Visualization
ProductivityAcceleration
Massive multifaceted scalability x x x x
Workload optimized resource allocation x x x x
Optimized, intelligent scheduling of work-
load and resources
x x x x
Optimized accelerator scheduling  manage-
ment (Intel MIC  GPGPU)
x x x x
Simplified user job submission  mgmt.
- Application Portal
x x x
x
x
Administrator dashboard and management
reporting
x x x x
Unified workload management across het-
erogeneous cluster(s)
x
Single cluster only
x x x
Workload optimized node provisioning x x x
Visualization Workload Optimization x
Workload-aware auto-power management x x x
Uptime Automation
Intelligent resource placement for failure
prevention
x x x x
Workload-aware maintenance scheduling x x x x
Basic deployment, training and premium
(24x7) support service
Standard
support only
x x x
Auto response to failures and events x
Limited no
re-boot/provision
x x x
Auto SLA enforcement
Resource sharing and usage policies x x x x
SLA and oriority policies x x x x
Continuous and future reservations scheduling x x x x
Usage accounting and budget enforcement x x x
Grid- and Cloud-Ready
Manage and share workload across multiple
clusters in wich area grid
x
w/grid Option
x
w/grid Option
x
w/grid Option
x
w/grid Option
User self-service: job request  management
portal
x x x x
Pay-for-use showback and chargeback x x x
Workload optimized node provisioning 
re-purposing
x x x
Multiple clusters in
single domain
58
Intelligent HPC
Workload Management
Moab HPC Suite –
Application Portal Edition
Moab HPC Suite – Application Portal Edition
Moab HPC Suite – Application Portal Edition provides easy sin-
gle-pointaccesstocommontechnicalapplications,data,resourc-
es,andjobsubmissionstoaccelerateprojectcompletionandcolla-
borationforenduserswhilereducingcomputingcosts.Itenables
more of your internal staff and partners to collaborate from any-
where using HPC applications and resources without any special-
ized HPC training while your intellectual property stays secure.
The optimization policies Moab provides significantly reduces
the costs and complexity of keeping up with compute demand
by optimizing resource sharing, utilization and higher through-
put while still balancing different service levels and priorities.
Remove the Barriers to Harnessing HPC
Organizations of all sizes and types are looking to harness the
potential of high performance computing (HPC) to enable and
accelerate more competitive product design and research. To do
this, you must find a way to extend the power of HPC resources to
designers, researchers and engineers not familiar with or trained
on using the specialized HPC technologies. Simplifying the user
access to and interaction with applications running on more
efficient HPC clusters removes the barriers to discovery and in-
novation for all types of departments and organizations. Moab
HPC Suite – Application Portal Edition provides easy single-point
access to common technical applications, data, HPC resources,
and job submissions to accelerate research analysis and collab-
oration process for end users while reducing computing costs.
HPC Ease of Use Drives Productivity
Moab HPC Suite – Application Portal Edition improves ease of
use and productivity for end users using HPC resources with sin-
gle-point access to their common technical applications, data,
resources, and job submissions across the design and research
process. It simplifies the specification and running of jobs on HPC
resources to a few simple clicks on an application specific tem-
plate with key capabilities including:
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015
Hpc kompass 2015

Más contenido relacionado

Similar a Hpc kompass 2015

Hpc compass 2016 17
Hpc compass 2016 17Hpc compass 2016 17
Hpc compass 2016 17TTEC
 
HPC compass 2013/2014
HPC compass 2013/2014HPC compass 2013/2014
HPC compass 2013/2014TTEC
 
HPC Technology Compass 2014/15
HPC Technology Compass 2014/15HPC Technology Compass 2014/15
HPC Technology Compass 2014/15Marco van der Hart
 
HPC Compass IBM Special 2013/14
HPC Compass IBM Special 2013/14HPC Compass IBM Special 2013/14
HPC Compass IBM Special 2013/14Marco van der Hart
 
HPC kompass ibm_special_2013/2014
HPC kompass ibm_special_2013/2014HPC kompass ibm_special_2013/2014
HPC kompass ibm_special_2013/2014TTEC
 
Hpc kompass ibm_special_2014
Hpc kompass ibm_special_2014Hpc kompass ibm_special_2014
Hpc kompass ibm_special_2014TTEC
 
HPC Compass 2014/2015 IBM Special
HPC Compass 2014/2015 IBM SpecialHPC Compass 2014/2015 IBM Special
HPC Compass 2014/2015 IBM SpecialMarco van der Hart
 
Ibm special hpc
Ibm special hpcIbm special hpc
Ibm special hpcTTEC
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance ComputingNous Infosystems
 
OpenPOWER's ISC 2016 Recap
OpenPOWER's ISC 2016 RecapOpenPOWER's ISC 2016 Recap
OpenPOWER's ISC 2016 RecapOpenPOWERorg
 
ISC 2016 Day 3 Recap
ISC 2016 Day 3 RecapISC 2016 Day 3 Recap
ISC 2016 Day 3 RecapOpenPOWERorg
 
Hpc cae nl
Hpc cae nlHpc cae nl
Hpc cae nlTTEC
 
Interventions for scientific and enterprise applications
Interventions for scientific and enterprise applicationsInterventions for scientific and enterprise applications
Interventions for scientific and enterprise applicationseSAT Publishing House
 
Interventions for scientific and enterprise applications based on high perfor...
Interventions for scientific and enterprise applications based on high perfor...Interventions for scientific and enterprise applications based on high perfor...
Interventions for scientific and enterprise applications based on high perfor...eSAT Journals
 
CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfHasanAfwaaz1
 
Hp network function virtualization technical white paper NFV
Hp  network function virtualization technical white paper NFVHp  network function virtualization technical white paper NFV
Hp network function virtualization technical white paper NFVIgnacio García-Carrillo
 
next-generation-data-centers
next-generation-data-centersnext-generation-data-centers
next-generation-data-centersJason Hoffman
 
Whitepaper the data management revolution
Whitepaper the data management revolutionWhitepaper the data management revolution
Whitepaper the data management revolutionPetr Nemec
 

Similar a Hpc kompass 2015 (20)

Hpc compass 2016 17
Hpc compass 2016 17Hpc compass 2016 17
Hpc compass 2016 17
 
HPC Compass 2016_17
HPC Compass 2016_17HPC Compass 2016_17
HPC Compass 2016_17
 
Hpc compass 2013-final_web
Hpc compass 2013-final_webHpc compass 2013-final_web
Hpc compass 2013-final_web
 
HPC compass 2013/2014
HPC compass 2013/2014HPC compass 2013/2014
HPC compass 2013/2014
 
HPC Technology Compass 2014/15
HPC Technology Compass 2014/15HPC Technology Compass 2014/15
HPC Technology Compass 2014/15
 
HPC Compass IBM Special 2013/14
HPC Compass IBM Special 2013/14HPC Compass IBM Special 2013/14
HPC Compass IBM Special 2013/14
 
HPC kompass ibm_special_2013/2014
HPC kompass ibm_special_2013/2014HPC kompass ibm_special_2013/2014
HPC kompass ibm_special_2013/2014
 
Hpc kompass ibm_special_2014
Hpc kompass ibm_special_2014Hpc kompass ibm_special_2014
Hpc kompass ibm_special_2014
 
HPC Compass 2014/2015 IBM Special
HPC Compass 2014/2015 IBM SpecialHPC Compass 2014/2015 IBM Special
HPC Compass 2014/2015 IBM Special
 
Ibm special hpc
Ibm special hpcIbm special hpc
Ibm special hpc
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
 
OpenPOWER's ISC 2016 Recap
OpenPOWER's ISC 2016 RecapOpenPOWER's ISC 2016 Recap
OpenPOWER's ISC 2016 Recap
 
ISC 2016 Day 3 Recap
ISC 2016 Day 3 RecapISC 2016 Day 3 Recap
ISC 2016 Day 3 Recap
 
Hpc cae nl
Hpc cae nlHpc cae nl
Hpc cae nl
 
Interventions for scientific and enterprise applications
Interventions for scientific and enterprise applicationsInterventions for scientific and enterprise applications
Interventions for scientific and enterprise applications
 
Interventions for scientific and enterprise applications based on high perfor...
Interventions for scientific and enterprise applications based on high perfor...Interventions for scientific and enterprise applications based on high perfor...
Interventions for scientific and enterprise applications based on high perfor...
 
CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdf
 
Hp network function virtualization technical white paper NFV
Hp  network function virtualization technical white paper NFVHp  network function virtualization technical white paper NFV
Hp network function virtualization technical white paper NFV
 
next-generation-data-centers
next-generation-data-centersnext-generation-data-centers
next-generation-data-centers
 
Whitepaper the data management revolution
Whitepaper the data management revolutionWhitepaper the data management revolution
Whitepaper the data management revolution
 

Más de TTEC

Transtec360 brochure
Transtec360 brochureTranstec360 brochure
Transtec360 brochureTTEC
 
Virtual graphic workspace
Virtual graphic workspace Virtual graphic workspace
Virtual graphic workspace TTEC
 
Software defined storage rev. 2.0
Software defined storage rev. 2.0 Software defined storage rev. 2.0
Software defined storage rev. 2.0 TTEC
 
Hpc life science-nl
Hpc life science-nlHpc life science-nl
Hpc life science-nlTTEC
 
transtec vdi in-a-box
transtec vdi in-a-boxtranstec vdi in-a-box
transtec vdi in-a-boxTTEC
 
Nexenta transtec
Nexenta transtecNexenta transtec
Nexenta transtecTTEC
 
Dacoria overview englisch
Dacoria overview englischDacoria overview englisch
Dacoria overview englischTTEC
 
Vmware certified IBM Servers
Vmware certified IBM ServersVmware certified IBM Servers
Vmware certified IBM ServersTTEC
 
Ibm v3700
Ibm v3700Ibm v3700
Ibm v3700TTEC
 
Sansymphony v-r9
Sansymphony v-r9Sansymphony v-r9
Sansymphony v-r9TTEC
 
Microsoft Hyper-V explained
Microsoft Hyper-V explainedMicrosoft Hyper-V explained
Microsoft Hyper-V explainedTTEC
 
ttec NAS powered by Open-E
ttec NAS powered by Open-Ettec NAS powered by Open-E
ttec NAS powered by Open-ETTEC
 
Sandy bridge platform from ttec
Sandy bridge platform from ttecSandy bridge platform from ttec
Sandy bridge platform from ttecTTEC
 
Solution magazine 2012_final_nl
Solution magazine 2012_final_nlSolution magazine 2012_final_nl
Solution magazine 2012_final_nlTTEC
 
Hpc compass transtec_2012
Hpc compass transtec_2012Hpc compass transtec_2012
Hpc compass transtec_2012TTEC
 
2012 07-01e tn ovd lto product overview
2012 07-01e tn ovd lto product overview2012 07-01e tn ovd lto product overview
2012 07-01e tn ovd lto product overviewTTEC
 
Nexenta ttec-2012
Nexenta ttec-2012Nexenta ttec-2012
Nexenta ttec-2012TTEC
 
ttec infortrend ds
ttec infortrend dsttec infortrend ds
ttec infortrend dsTTEC
 
Pvma32xx series-raid
Pvma32xx series-raidPvma32xx series-raid
Pvma32xx series-raidTTEC
 
SANsymphony V
SANsymphony VSANsymphony V
SANsymphony VTTEC
 

Más de TTEC (20)

Transtec360 brochure
Transtec360 brochureTranstec360 brochure
Transtec360 brochure
 
Virtual graphic workspace
Virtual graphic workspace Virtual graphic workspace
Virtual graphic workspace
 
Software defined storage rev. 2.0
Software defined storage rev. 2.0 Software defined storage rev. 2.0
Software defined storage rev. 2.0
 
Hpc life science-nl
Hpc life science-nlHpc life science-nl
Hpc life science-nl
 
transtec vdi in-a-box
transtec vdi in-a-boxtranstec vdi in-a-box
transtec vdi in-a-box
 
Nexenta transtec
Nexenta transtecNexenta transtec
Nexenta transtec
 
Dacoria overview englisch
Dacoria overview englischDacoria overview englisch
Dacoria overview englisch
 
Vmware certified IBM Servers
Vmware certified IBM ServersVmware certified IBM Servers
Vmware certified IBM Servers
 
Ibm v3700
Ibm v3700Ibm v3700
Ibm v3700
 
Sansymphony v-r9
Sansymphony v-r9Sansymphony v-r9
Sansymphony v-r9
 
Microsoft Hyper-V explained
Microsoft Hyper-V explainedMicrosoft Hyper-V explained
Microsoft Hyper-V explained
 
ttec NAS powered by Open-E
ttec NAS powered by Open-Ettec NAS powered by Open-E
ttec NAS powered by Open-E
 
Sandy bridge platform from ttec
Sandy bridge platform from ttecSandy bridge platform from ttec
Sandy bridge platform from ttec
 
Solution magazine 2012_final_nl
Solution magazine 2012_final_nlSolution magazine 2012_final_nl
Solution magazine 2012_final_nl
 
Hpc compass transtec_2012
Hpc compass transtec_2012Hpc compass transtec_2012
Hpc compass transtec_2012
 
2012 07-01e tn ovd lto product overview
2012 07-01e tn ovd lto product overview2012 07-01e tn ovd lto product overview
2012 07-01e tn ovd lto product overview
 
Nexenta ttec-2012
Nexenta ttec-2012Nexenta ttec-2012
Nexenta ttec-2012
 
ttec infortrend ds
ttec infortrend dsttec infortrend ds
ttec infortrend ds
 
Pvma32xx series-raid
Pvma32xx series-raidPvma32xx series-raid
Pvma32xx series-raid
 
SANsymphony V
SANsymphony VSANsymphony V
SANsymphony V
 

Último

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Último (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Hpc kompass 2015

  • 2. N E SE SW NE NWMore Than 30 Years of Experience in Scientific Computing
  • 3. 3 1980markedthebeginningofadecadewherenumerousstartups were created, some of which later transformed into big players in the IT market. Technical innovations brought dramatic changes to the nascent computer market. In Tübingen, close to one of Ger- many’s prime and oldest universities, transtec was founded. In the early days, transtec focused on reselling DEC computers and peripherals, delivering high-performance workstations to university institutes and research facilities. In 1987, SUN/Sparc and storage solutions broadened the portfolio, enhanced by IBM/RS6000 products in 1991. These were the typical worksta- tions and server systems for high performance computing then, used by the majority of researchers worldwide. In the late 90s, transtec was one of the first companies to offer highly customized HPC cluster solutions based on standard Intel architecture servers, some of which entered the TOP 500 list of the world’s fastest computing systems. Thus, given this background and history, it is fair to say that tran- stec looks back upon a more than 30 years’ experience in scientif- ic computing; our track record shows more than 750 HPC instal- lations. With this experience, we know exactly what customers’ demands are and how to meet them. High performance and ease of management – this is what customers require today. HPC systems are for sure required to peak-perform, as their name indicates, but that is not enough: they must also be easy to han- dle. Unwieldy design and operational complexity must be avoid- ed or at least hidden from administrators and particularly users of HPC computer systems. This brochure focusses on where transtec HPC solutions excel. transtec HPC solutions use the latest and most innovative tech- nology. Bright Cluster Manager as the technology leader for uni- fied HPC cluster management, leading-edge Moab HPC Suite for job and workload management, Intel Cluster Ready certification as an independent quality standard for our systems, Panasas HPC storage systems for highest performance and real ease of man- agement required of a reliable HPC storage system. Again, with these components, usability, reliability and ease of management are central issues that are addressed, even in a highly heteroge- neous environment. transtec is able to provide customers with well-designed, extremely powerful solutions for Tesla GPU com- puting, as well as thoroughly engineered Intel Xeon Phi systems. Intel’s InfiniBand Fabric Suite makes managing a large InfiniBand fabric easier than ever before, and Numascale provides excellent Technology for AMD-based large-SMP Systems – transtec mas- terly combines excellent and well-chosen components that are already there to a fine-tuned, customer-specific, and thoroughly designed HPC solution. Your decision for a transtec HPC solution means you opt for most intensive customer care and best service in HPC. Our experts will be glad to bring in their expertise and support to assist you at any stage, from HPC design to daily cluster operations, to HPC Cloud Services. Last but not least, transtec HPC Cloud Services provide custom- ers with the possibility to have their jobs run on dynamically pro- vided nodes in a dedicated datacenter, professionally managed and individually customizable. Numerous standard applications like ANSYS, LS-Dyna, OpenFOAM, as well as lots of codes like Gromacs, NAMD, VMD, and others are pre-installed, integrated into an enterprise-ready cloud management environment, and ready to run. Have fun reading the transtec HPC Compass 2015/16!
  • 4. High Performance Computing for Everyone Director of HPC Solutions at transtec AG, discussing the wide area of use of High Performance Comput- ing (HPC), the benefits of turnkey solutions and the increasing popularity of Big Data Analytics. Interview with Dr. Oliver Tennert
  • 5. 5 Dr. Tennert, in which industries have you seen an increase in demand for high performance computing? Fromourperspective,itisdifficulttodifferentiatebetweenspecific segments.Ingeneral,thehighperformancecomputingmarketisa globallyexpandingsector.TheUSAmericanITanalystIDCpredicts an annual average growth of approximately 7% over the next 5 years across all sectors even in traditional HPC-dominated indus- tries such as CAE (Computer-Aided Engineering) and Life Sciences. For which type of applications is HPC usually deployed? Could you give us a concrete example of its application? High performance computing is basically divided into two areas of application: simulation on the one hand and data analysis on the other. Take the automotive industry for example. During the vehicle design and development stage, a great deal of effort is invested in crash simulations while the number of ma- terial-intensive crash tests, which used to be performed regular- ly, have been minimized and are only conducted for verification purposes. The analysis of sequenced genome data for hereditary diseases is a prime example of how data analysis is used effectively – or from the field of molecular genetics – the measurement of evolu- tionary relations between different types or the effect of viruses on the germline. High performance computing is often criticized under the as- pect that a HPC installation is far too expensive and complex for “ordinary” organisations. Which arguments would you use against these accusations? [Grinning:] In the past we have sold HPC solutions to many cus- tomers who would describe themselves as “ordinary” and who have been extremely satisfied with the results. But you are in fact right: when using a complex HPC system, it involves hiding this complexity away from users and administrators. When you take a look under the hood of a car, you will see that it is also a complex machine. However drivers do not have to concern them- selves with features such as transmission technology and ABS electronics when driving the vehicle. It is therefore important to ensure you implement a straightfor- ward and enterprise-ready management and service concept. transtec offers an intelligent portfolio of service and manage- ment concepts which are compatible with the paradigms of “easy to manage” and “easy to use”. What are the challenges when launching, installing and operat- ing an HPC system? What should administrators bear in mind to ensure that the HPC system remains manageable? When looking for an HPC systems provider, customers should make sure that the provider has precisely understood their needs and requirements. This is what differentiates a solutions provider
  • 6. 6 High Performance Computing for Everyone Interview with Dr. Oliver Tennert such as transtec from a product-centric hardware supplier or manufacturer. The optimum sizing of the entire solution and the above mentioned easy manageability has a direct impact on the customer’s or user’s productivity. Once the system has been delivered, customers want to start pro- ductive operation as soon as possible. transtec provides custom- ers with turnkey HPC solutions, which means: depending on the customer’srequirements,ourspecialistscanmanagethecomplete installation and integration into the customer’s environment. What is the role of HPC when deploying cloud computing and big data analyses in large organisations? When deploying cloud computing, it is important to make a dis- tinction: that what we today call a “private cloud” is essentially – leaving technological details to one side – a tried-and-tested concept in which companies can consolidate and provide ac- cess to their computing capacities from one central location. The use of “public cloud” providers such as Amazon has signifi- cantly dropped in light of previous NSA reports. Security aspects have always been a serious obstacle for cloud computing and, in terms of HPC, the situation becomes all the more complicated for an industrial organisation as high performance computing is an essential element in the organisation’s value chain. The data the organisation uses for work is a vital asset and critical to the company. BigDataAnalytics,inotherwordstheabilitytoprocessandanalyse large data volumes, is essentially nothing new. Recently however, the amount of data has increased at a significantly faster pace than the computing capacities available.
  • 7. 7 For example, with the use of biochemi-cal methods in “next generation sequencing”, complete genome sequences can today be generated and prepared for further analysis at a far faster rate than ever before. The need for HPC solutions has recently become increas- ingly more evident. While we are on the topic of big data: To what extent are high-performance, modern in-memory computing technologies competing with HPC systems? This depends on whether you see these technologies in direct competition to each other! Shared in-memory databases and the associated computing technologies are innovative concepts for the particular application, for example real-time analytics or data mining. We believe that both are contemporary technologies like manyothersthatenabletheanalysisoflargedatavolumesinreal- time. From transtec’s point of view, these are HPC technologies and some of our customers are already using them on HPC solutions we have provided. What are expected of today’s supercomputers? What does the future hold for the HPC segment – and what follows on from Petaflop and Petabyte? Whenwetalkofsupercomputers,wearereferringtothetop500list of the fastest computers in the world. “Flop” is used as a measure of performance, in other words the number of floating point operations per second. “Tianhe-2” from China currently holds the number one position with an amazing 34 Petaflops, in other words 34 billion Flops. However rank 500 scoring 133 Teraflops (133 million Flops) is still an impressive figure. That is rough- ly the equivalent of the consolidated performance of 2,000 well configured PCs. Exabyte follows Petabyte. Over the last few years it has been widely discussed on the HPC market what kind of sys- tem would be needed to manage these data volumes, particularly in terms of capacity, power supply and cooling. Significant technology advancements will expectedly be required before this type of supercomputer can be produced. Sheer mass alone will not be able to manage these volumes. Keyword - security: What is the best way to protect HPC systems in an age of the NSA scandal and ever increasing cybercrime? Basically the same applies to HPC systems as it does to company notebooks and workstations where IT security is concerned: access by unauthorised parties must be prohibited by deploying secure IT structures. As described above, enormous volumes of critical organisation data are stored on an HPC system which is why efficient data management with effective access control are essential in HPC environments. The network connections between the data centre and user workstations must also be secured on a physical level. This especially applies when the above-mentioned “private cloud” scenarios are set up across various locations to allow them to work together regardless of the particular country or time zone. „Petabyte is followed by Exabyte, although before that, several technological jumps need to take place.“
  • 8. 8 Technology Compass Table of Contents and Introduction High Performance Computing......................................... 10 Performance Turns Into Productivity....................................................12 Flexible Deployment With xCAT................................................................14 Service and Customer Care From A to Z ..............................................16 Vector Supercomputing....................................................... 18 The NEC SX Architecture................................................................................20 History of the SX Series..................................................................................25 Advanced Cluster Management Made Easy............ 34 Easy-to-use, Complete and Scalable.......................................................36 Cloud Bursting With Bright..........................................................................44 Intelligent HPC Workload Management.................... 46 Moab HPC Suite – Enterprise Edition.....................................................48 Moab HPC Suite – Grid Option....................................................................52 Optimizing Accelerators with Moab HPC Suite...............................54 Moab HPC Suite – Application Portal Edition...................................58 Moab HPC Suite – Remote Visualization Edition............................60 Remote Visualization and Workflow Optimization........................................................ 68 NICE EnginFrame: A Technical Computing Portal..........................70 Desktop Cloud Virtualization.....................................................................74 Cloud Computing...............................................................................................78 NVIDIA GRID...........................................................................................................80 Virtual GPU Technology..................................................................................82
  • 9. 9 Intel Cluster Ready ................................................................. 84 A Quality Standard for HPC Clusters......................................................86 Intel Cluster Ready builds HPC Momentum......................................90 The transtec Benchmarking Center........................................................94 Parallel NFS.................................................................................. 96 The New Standard for HPC Storage........................................................98 Whats´s New in NFS 4.1?............................................................................ 100 Panasas HPC Storage.................................................................................... 102 BeeGFS – A Parallel File System...........................................................116 A Parallel File System.................................................................................... 118 NVIDIA GPU Computing......................................................126 What is GPU Computing?........................................................................... 128 Kepler GK110/210 GPU Computing Architecture......................... 130 A Quick Refresher on CUDA....................................................................... 140 Intel Xeon Phi Coprocessor..............................................142 The Architecture.............................................................................................. 144 An Outlook on Knights Landing and Omni Path.......................... 152 InfiniBand High-Speed Interconnect.........................156 The Architecture.............................................................................................. 158 Intel MPI Library 4.0 Performance......................................................... 160 Intel Fabric Suite 7.......................................................................................... 164 Numascale..................................................................................168 NumaConnect Background....................................................................... 170 Technology.......................................................................................................... 172 Redefining Scalable OpenMP and MPI............................................... 178 Big Data................................................................................................................. 182 Cache Coherence............................................................................................. 184 NUMA and ccNUMA........................................................................................ 186 IBM Platform HPC...................................................................188 Enterprise-Ready Cluster & Workload Management................ 190 What’s New in IBM Platform LSF 8........................................................ 194 IBM Platform MPI 8.1..................................................................................... 202 General Parallel File System (GPFS).............................204 What is GPFS?.................................................................................................... 206 What´s New in GPFS Version 3.5............................................................ 220 Glossary........................................................................................222
  • 10.
  • 11. High Performance Computing (HPC) has been with us from the very beginning of the computer era. High-performance computers were built to solve numerous problems which the “human computers” could not handle. The term HPC just hadn’t been coined yet. More important, some of the early principles have changed fundamentally. High Performance Computing – Performance Turns Into Productivity LifeSciencesRiskAnalysisSimulationBigDataAnalyticsHighPerformanceComputing
  • 12. 12 Performance Turns Into Productivity HPC systems in the early days were much different from those we see today. First, we saw enormous mainframes from large computer manufacturers, including a proprietary operating system and job management system. Second, at universities and research institutes, workstations made inroads and scien- tists carried out calculations on their dedicated Unix or VMS workstations. In either case, if you needed more computing power, you scaled up, i.e. you bought a bigger machine. Today the term High-Performance Computing has gained a fundamen- tally new meaning. HPC is now perceived as a way to tackle complex mathematical, scientific or engineering problems. The integration of industry standard, “off-the-shelf” server hardware into HPC clusters facilitates the construction of computer net- works of such power that one single system could never achieve. The new paradigm for parallelization is scaling out. Computer-supported simulations of realistic processes (so-called ComputerAidedEngineering–CAE)hasestablisheditselfasathird key pillar in the field of science and research alongside theory and experimentation. It is nowadays inconceivable that an air- craft manufacturer or a Formula One racing team would operate without using simulation software. And scientific calculations, such as in the fields of astrophysics, medicine, pharmaceuticals and bio-informatics, will to a large extent be dependent on super- computers in the future. Software manufacturers long ago rec- ognized the benefit of high-performance computers based on powerful standard servers and ported their programs to them accordingly. The main advantages of scale-out supercomputers is just that: they are infinitely scalable, at least in principle. Since they are based on standard hardware components, such a supercomput- er can be charged with more power whenever the computational capacity of the system is not sufficient any more, simply by add- ing additional nodes of the same kind. A cumbersome switch to a different technology can be avoided in most cases. High Performance Computing Performance Turns Into Productivity “transtec HPC solutions are meant to provide customers with unpar- alleled ease-of-management and ease-of-use. Apart from that, decid- ing for a transtec HPC solution means deciding for the most intensive customer care and the best service imaginable.” Dr. Oliver Tennert Director Technology Management & HPC Solutions
  • 13. S EW SE SW tech BOX 13 Variations on the theme: MPP and SMP Parallel computations exist in two major variants today. Ap- plications running in parallel on multiple compute nodes are frequently so-called Massively Parallel Processing (MPP) appli- cations. MPP indicates that the individual processes can each utilize exclusive memory areas. This means that such jobs are predestined to be computed in parallel, distributed across the nodes in a cluster. The individual processes can thus utilize the separate units of the respective node – especially the RAM, the CPU power and the disk I/O. Communication between the individual processes is implement- ed in a standardized way through the MPI software interface (Message Passing Interface), which abstracts the underlying network connections between the nodes from the processes. However, the MPI standard (current version 2.0) merely requires source code compatibility, not binary compatibility, so an off-the- shelf application usually needs specific versions of MPI libraries in order to run. Examples of MPI implementations are OpenMPI, MPICH2, MVAPICH2, Intel MPI or – for Windows clusters – MS-MPI. If the individual processes engage in a large amount of commu- nication, the response time of the network (latency) becomes im- portant. Latency in a Gigabit Ethernet or a 10GE network is typi- cally around 10 µs. High-speed interconnects such as InfiniBand, reduce latency by a factor of 10 down to as low as 1 µs. Therefore, high-speed interconnects can greatly speed up total processing. The other frequently used variant is called SMP applications. SMP, in this HPC context, stands for Shared Memory Processing. It involves the use of shared memory areas, the specific imple- mentation of which is dependent on the choice of the underlying operating system. Consequently, SMP jobs generally only run on a single node, where they can in turn be multi-threaded and thus be parallelized across the number of CPUs per node. For many HPC applications, both the MPP and SMP variant can be chosen. Many applications are not inherently suitable for parallel execu- tion. In such a case, there is no communication between the in- dividual compute nodes, and therefore no need for a high-speed network between them; nevertheless, multiple computing jobs can be run simultaneously and sequentially on each individual node, depending on the number of CPUs. In order to ensure optimum computing performance for these applications, it must be examined how many CPUs and cores de- liver the optimum performance. We find applications of this sequential type of work typically in the fields of data analysis or Monte-Carlo simulations.
  • 14. BOX 14 High Performance Computing Flexible Deployment With xCAT xCAT as a Powerful and Flexible Deployment Tool xCAT (Extreme Cluster Administration Tool) is an open source toolkit for the deployment and low-level administration of HPC cluster environments, small as well as large ones. xCAT provides simple commands for hardware control, node discovery, the collection of MAC addresses, and the node deploy- ment with (diskful) or without local (diskless) installation. The cluster configuration is stored in a relational database. Node groups for different operating system images can be defined. Also, user-specific scripts can be executed automatically at in- stallation time. xCAT Provides the Following Low-Level Administrative Features Remote console support Parallel remote shell and remote copy commands Plugins for various monitoring tools like Ganglia or Nagios Hardware control commands for node discovery, collecting MAC addresses, remote power switching and resetting of nodes Automatic configuration of syslog, remote shell, DNS, DHCP, and ntp within the cluster Extensive documentation and man pages For cluster monitoring, we install and configure the open source tool Ganglia or the even more powerful open source solution Na- gios, according to the customer’s preferences and requirements.
  • 15. 15 Local Installation or Diskless Installation We offer a diskful or a diskless installation of the cluster nodes. A diskless installation means the operating system is hosted par- tially within the main memory, larger parts may or may not be included via NFS or other means. This approach allows for de- ploying large amounts of nodes very efficiently, and the cluster is up and running within a very small timescale. Also, updating the cluster can be done in a very efficient way. For this, only the boot image has to be updated, and the nodes have to be reboot- ed. After this, the nodes run either a new kernel or even a new operating system. Moreover, with this approach, partitioning the cluster can also be very efficiently done, either for testing pur- poses, or for allocating different cluster partitions for different users or applications. Development Tools, Middleware, and Applications According to the application, optimization strategy, or underly- ing architecture, different compilers lead to code results of very different performance. Moreover, different, mainly commercial, applications, require different MPI implementations. And even when the code is self-developed, developers often prefer one MPI implementation over another. According to the customer’s wishes, we install various compilers, MPI middleware, as well as job management systems like Para- station, Grid Engine, Torque/Maui, or the very powerful Moab HPC Suite for the high-level cluster management.
  • 16. BOX individual Presales consulting application-, customer-, site-specific sizing of HPC solution burn-in tests of systems benchmarkingof different systems continual improvement software OS installation application installation onsite hardware assembly integration into customer’s environment customer training maintenance, support managedservices individual Presales consulting application-, customer-, site-specific sizing of HPC solution burn-in tests of systems benchmarkingof different systems continual improvement software OS installation application installation onsite hardware assembly integration into customer’s environment customer training maintenance, support managedservices 16 High Performance Computing Services and Customer Care from A to Z transtec HPC as a Service You will get a range of applications like LS-Dyna, ANSYS, Gromacs, NAMD etc. from all kinds of areas pre-installed, integrated into an enterprise-ready cloud and workload management system, and ready-to run. Do you miss your application? Ask us: HPC@transtec.de transtec Platform as a Service You will be provided with dynamically provided compute nodes for running your individual code. The operating sys- tem will be pre-installed according to your requirements. Common Linux distributions like RedHat, CentOS, or SLES are the standard. Do you need another distribution? Ask us: HPC@transtec.de transtec Hosting as a Service You will be provided with hosting space inside a profession- ally managed and secured datacenter where you can have your machines hosted, managed, maintained, according to your requirements. Thus, you can build up your own private cloud. What range of hosting and maintenance services do you need? Tell us: HPC@transtec.de HPC @ transtec: Services and Customer Care from A to Z transtec AG has over 30 years of experience in scientific computing andisoneoftheearliestmanufacturersofHPCclusters.Fornearlya decade,transtechasdeliveredhighlycustomizedHighPerformance clusters based on standard components to academic and industry customersacrossEuropewithallthehighqualitystandardsandthe customer-centric approach that transtec is well known for. Every transtec HPC solution is more than just a rack full of hardware – it is a comprehensive solution with everything the HPC user, owner, and operator need. In the early stages of any customer’s HPC project, transtec experts provide extensive and detailed consulting to the customer – they benefit from expertise and experience. Consulting is followed by benchmarking of different systems with either spe- cificallycraftedcustomercodeorgenerallyacceptedbenchmarking routines; this aids customers in sizing and devising the optimal and detailed HPC configuration. Each and every piece of HPC hardware thatleavesourfactoryundergoesaburn-inprocedureof24hoursor Services and Customer Care from A to Z
  • 17. 17 more if necessary. We make sure that any hardware shipped meets ourandourcustomers’qualityrequirements.transtecHPCsolutions are turnkey solutions. By default, a transtec HPC cluster has every- thing installed and configured – from hardware and operating sys- temtoimportantmiddlewarecomponentslikeclustermanagement or developer tools and the customer’s production applications. Onsite delivery means onsite integration into the customer’s pro- ductionenvironment,beitestablishingnetworkconnectivitytothe corporate network, or setting up software and configuration parts. transtec HPC clusters are ready-to-run systems – we deliver, you turn the key, the system delivers high performance. Every HPC proj- ect entails transfer to production: IT operation processes and poli- ciesapplytothenewHPCsystem.Effectively,ITpersonnelistrained hands-on, introduced to hardware components and software, with all operational aspects of configuration management. transtec services do not stop when the implementation projects ends. Beyond transfer to production, transtec takes care. transtec offers a variety of support and service options, tailored to the cus- tomer’s needs. When you are in need of a new installation, a major reconfiguration or an update of your solution – transtec is able to support your staff and, if you lack the resources for maintaining the cluster yourself, maintain the HPC solution for you. From Professional Services to Managed Services for daily opera- tionsandrequiredservicelevels,transtecwillbeyourcompleteHPC service and solution provider. transtec’s high standards of perfor- mance, reliability and dependability assure your productivity and completesatisfaction.transtec’sofferingsofHPCManagedServices offercustomersthepossibilityofhavingthecompletemanagement and administration of the HPC cluster managed by transtec service specialists,inanITILcompliantway.Moreover,transtec’sHPConDe- mand services help provide access to HPC resources whenever they need them, for example, because they do not have the possibility of owning and running an HPC cluster themselves, due to lacking infrastructure, know-how, or admin staff. transtec HPC Cloud Services Last but not least transtec’s services portfolio evolves as custom- ers‘ demands change. Starting this year, transtec is able to provide HPCCloudServices.transtecusesadedicateddatacentertoprovide computing power to customers who are in need of more capacity than they own, which is why this workflow model is sometimes called computing-on-demand. With these dynamically provided resources, customers with the possibility to have their jobs run on HPC nodes in a dedicated datacenter, professionally managed and secured, and individually customizable. Numerous standard ap- plications like ANSYS, LS-Dyna, OpenFOAM, as well as lots of codes like Gromacs, NAMD, VMD, and others are pre-installed, integrated into an enterprise-ready cloud and workload management environ- ment, and ready to run. Alternatively, whenever customers are in need of space for host- ing their own HPC equipment because they do not have the space capacity or cooling and power infrastructure themselves, transtec is also able to provide Hosting Services to those customers who’d liketohavetheirequipmentprofessionallyhosted,maintained,and managed. Customers can thus build up their own private cloud! Are you interested in any of transtec’s broad range of HPC related services? Write us an email to HPC@transtec.de. We’ll be happy to hear from you!
  • 18.
  • 19. Initially, in the 80s and 90s, the notion of supercomputing was almost a synonym for vector-computing. And this concept is back now in different flavors. Now almost every architecture is using SIMD, the acronym for “single instruction multiple data”. Vector Supercomputing – The NEC SX Architecture LifeSciencesRiskAnalysisSimulationBigDataAnalyticsHighPerformanceComputing
  • 20. 20 History of HPC, and what we learn During the last years systems based on x86 architecture domi- nated the HPC market, with GPUs and many-core systems re- cently making their inroads. But this period is coming to an end, and it is again time for a differentiated complementary HPC- targeted system. NEC will develop such a system. The Early Days When did HPC really start? Probably one would think about the time when Seymour Cray developed high-end machines at CDC and then started his own company Cray Research, with the first productbeingtheCray-1in1976.Atthattimesupercomputingwas very special, systems were expensive, and access was extremely limited for researchers and students. But the business case was obvious in many fields, scientifically as well as commercially. These Cray machines were vector-computers, initially featuring only one CPU, as parallelization was far from being mainstream, even hardly a matter of research. And code development was easier than today, a code of 20000 lines was already considered “complex” at that time, and there were not so many standard codes and algorithms available anyway. For this reason code-de- velopment could adapt to any kind of architecture to achieve op- timal performance. In 1981 NEC started marketing its first vector supercomputer, the SX-2, a single-CPU-vector-architecture like the Cray-1. The SX-2 was a milestone, it was the first CPU to exceed one GFlops of peak performance. The Attack of the “Killer-Micros” In the 1990s the situation changed drastically, and what we see today is a direct consequence. The “killer-micros” took over, new parallel, later “massively parallel” systems (MPP) were built based on micro-processor architectures which were predominantly Vector Supercomputing The NEC SX Architecture
  • 21. 21 targeted to commercial sectors, or in the case of Intel, AMD or Motorola to PCs, to products which were successfully invading the daily life, partially more than even experts had anticipated. Clearly one could save a lot of cost by using, in the worst case slightly adapting these CPUs for high-end-systems, and by fo- cusing investments on other ingredients, like interconnect tech- nology. At the same time increasingly Linux was adopted and of- fered a cheap generally available OS platform to allow everybody to use HPC for a reasonable price. Thesesystemswerecheaperbecauseoftworeasons:thedevelop- ment of standard CPUs was sponsored by different businesses, and another expensive part, the memory architecture with many CPUsaddressingthesameflatmemoryspace,wasreplacedbysys- tems comprised of many distinct so-called nodes, complete sys- tems by themselves, plus some kind of interconnect technology, which could also be taken from standard equipment partially. The impact on the user was the necessity to code for these “dis- tributed memory systems”. Standards for coding evolved, PVM and later MPI, plus some proprietary paradigms. MPI is dominat- ing still today. In the high-end the special systems were still dominating, be- cause they had architectural advantages and the codes were run- ning with very good efficiency on such systems. Most codes were not parallel at all or at least not well parallelized, let alone on systems with distributed memory. The skills and experiences to generate parallel versions of codes were lacking in those compa- nies selling high-end-machines and also on the customers’ side. So it took some time for the MPPs to take over. But it was inevi- table, because this situation, sometimes called “democratization of HPC”, was positive for both the commercial and the scientific developments. From that time on researchers and students in academia and industry got sufficient access to resources they could not have dreamt of before. The x86 Dominance One could dispute whether the x86 architecture is the most advanced or elegant, undoubtedly it became the dominant one. It was developed for mass-markets, this was the strate- gy, and HPC was a side-effect, sponsored by the mass-market. This worked successfully and the proportion of x86 Linux clus- ters in the Top500 list increased year by year. Because of the mass-markets it made sense economically to invest a lot into the development of LSI technology (large-scale integration). The related progress, which led to more and more transistors on the chips and increasing frequencies, helped HPC applica- tions, which got more, faster and cheaper systems with each generation. Software did not have to adapt a lot between gen- erations and the software developments rather focused initial- ly on the parallelization for distributed memory, later on new functionalities and features. This worked for quite some years. Systems of sufficient size to produce relevant results for typical applications are cheap, and most codes do not really scale to a level of parallelism to use thousands of processes. Some light- house projects do, after a lot of development which is not easily affordable, but the average researcher or student is often com- pletely satisfied with something between a few and perhaps a few hundred nodes of a standard Linux cluster. In the meantime already a typical Intel-based dual-socket-node of a standard cluster has 24 cores, soon a bit more. So paralleliza- tion is increasingly important. There is “Moore’s law”, invented by Gordon E. Moore, a co-founder of the Intel Corporation, in 1965. Originallyitstatedthatthenumberoftransistorsinanintegrated circuit doubles about every 18 months, leading to an exponential increase. This statement was often misinterpreted that the per- formance of a CPU would double every 18 months, which is mis- leading. We observe ever increasing core-counts, the transistors are just there, so why not? That applications might not be able to
  • 22. 22 use these many cores is another question, in principle the poten- tial performance is there. The “standard benchmark” in the HPC-scene is the so-called HPL, High Performance LINPACK, and the famous Top500 list is built based on this. This HPL benchmark is only one of three parts of the original LINPACK benchmark, this is often forgotten! But the other parts did not show such a good progress in performance, their scalability is limited, so they could not serve to spawn com- petition even between nations, which leads to a lot of prestige. Moreover there are at most a few real codes in the world with similar characteristics as the HPL. In other words: it does not tell a lot. This is a highly political situation and probably quite some money is wasted which would better be invested in brain-ware. Anyway it is increasingly doubtful that LSI technology will ad- vance at the same pace in the future. The grid-constant of Silicon is 5.43 Å. LSI technology is now using 14 nm processes, the Intel Broadwell generation, which is about 25 times the grid-constant, and with 7nm LSIs at the long-term horizon. Some scientists as- sume there is still some room to improve even more, but once the features on the LSI have a thickness of only a few atoms one cannot get another factor of ten. Sometimes the impact of some future “disruptive technology” is claimed … Another obstacle is the cost to build a chip fab for a new genera- tion of LSI technology. Certainly there are ways to optimize both production and cost, and there will be bright ideas, but with cur- rent generations a fab costs at least many billion $. There have been ideas of slowing down the innovation cycle on that side to merely be able to recover the investment. Memory Bandwidth For many HPC applications it is not the compute power of the Vector Supercomputing The NEC SX Architecture
  • 23. 23 CPU which counts, it is the speed with which the CPU can exchange data with the memory, the so-called memory band- width. If it is neglected the CPUs will wait for data, the perfor- mance will decrease. To describe the balance between CPU performance and memory bandwidth one often uses the ratio between operations and the amount of bytes that can be trans- ferred while they are running, the “byte-per-flop-ratio”. Tradi- tionally vector machines were always great in this regard, and this is to continue as far as current plans are known. And this is the reason why a vector machine can realize a higher fraction of its theoretical performance on a real application than a sca- lar machine. Power Consumption One other but really overwhelming problem is the increasing power consumption of systems. Looking at different genera- tions of Linux clusters one can assume that a dual-socket-node, depending on how it is equipped with additional features, will consume between 350 Watt and 400 Watt. Thisonlyslightlyvariesbetweendifferentgenerations,perhapsit will slightly increase. Even in Europe sites are talking about elec- tricity budgets in excess of 1 MWatt, even beyond 10 MWatts. One also has to cool the system, which adds something like 10-30% to the electricity cost! The cost to run such a system over 5 years will be in the same order of magnitude as the purchase price! The power consumption will grow with the frequency of the CPU, is the dominant reason why frequencies are stagnating, if not de- creasing between product generations. There are more cores, but individual cores of an architecture are getting slower, not faster! This has to be compensated on the application side by increased scalability, which is not always easy to achieve. In the past us- ers easily got a higher performance for almost every application with each new generation, this is no longer the case. Architecture Development SothebasicfactsclearlyindicatelimitationsoftheLSItechnology and they are known since quite some time. Consequently this leads to attempts to get additional performance by other means. And the developments in the market clearly indicate that high- end users are willing to consider the adaptation of codes. To utilize available transistors CPUs with more complex func- tionality are designed, utilizing SIMD instructions or providing more operations per cycle than just an add and a multiply. SIMD is “Single Instruction Multiple Data”. Initially at Intel the SIMD-in- structions provided two results in 64-bit-precision per cycle rath- er than one, in the meantime with AVX it is even four, on Xeon Phi already eight. This makes a lot of sense. The electricity consumed by the “ad- ministration” of the operations, code fetch and decode, configu- ration of paths on the CPU etc. is a significant portion of the pow- er consumption. If the energy is used not only for one operation but for two or even four, the energy per operation is obviously reduced. In essence and in order to overcome the performance-bottleneck the architectures developed into the direction which to a larger extent was already realized in the vector-supercomputers dec- ades ago! There is almost no architecture left without such features. But SIMD-operations require application code to be vectorized. Be- fore code developments and programming paradigms did not care for fine-grain data-parallelism, which is the basis of appli- cability of SIMD. Thus vector-machines could not show their strength,onscalarcodetheycannotbeatascalarCPU.Butstrictly speaking, there is no scalar CPU left, standard x86-systems do not achieve their optimal performance on purely scalar code as well!
  • 24. 24 Alternative Architectures? During recent years GPUs and many-core architectures are in- creasingly used in the HPC market. Although these systems are somewhat diverse in detail, in any case the degree of parallelism needed is much higher than in standard CPUs. This points into the same direction, a low-level parallelism is used to address the bottlenecks. In the case of Nvidia GPUs the individual so-called CUDA cores, relatively simple entities, are steered by a more complex central processor, realizing a high level of SIMD paral- lelism. In the case of Xeon Phi the individual cores are also less powerful, but they are used in parallel using coding paradigms which operate on the level of loop nests, an approach which was already used on vector supercomputers during the 1980s! So these alternative architectures not only go into the same direction, the utilization of fine-grain parallelism, they also in- dicate that coding styles which suite real vector machines will dominate the future software developments. Note that at last some of these systems have an economic backing by mass mar- kets, gaming! Vector Supercomputing The NEC SX Architecture
  • 25. S EW SE SW tech BOX 25 History of the SX Series As already stated NEC started marketing the first commercially available generation of this series, the SX-2, in 1981. By now the latest generation is the SX-ACE. In 1998 Prof. Tadashi Watanabe, the developer of the SX series and the major driver of NEC’s HPC-business, received the Eck- hard-Mauchly-medal, a renowned US-award for these achieve- ments,andin2006alsothe“SeymourCrayAward”.Prof.Watanabe was a very pragmatic developer, a knowledgeable scientist and a visionary manager, all at the same time, and a great person to talk to. From the beginning the SX-architecture was different from the competing Cray-vectors, it already integrated some of the features that RISC CPUs at that time were using, but com- bined them with vector registers and pipelines. For example the SX could reorder instructions in hardware on the fly, which Cray machines could not do. In effect this made the system much easier to use. The single processor SX-2 was the fastest CPU at that time, the first one to break the 1 GFlops bar- rier. Naturally the compiler technology for automatic vectoriza- tion and optimization was important. With the SX-3 NEC developed a high-end shared-memory system. It had up to four very strong CPUs. The resolution of memory ac- cess conflicts by CPUs working in parallel, and the consistency of data throughout the whole shared-memory system are non-triv- ial, and the development of such a memory architecture was a major breakthrough. WiththeSX-4NECimplementedahigh-endsystembasedonCMOS. ItwasthedirectcompetitortotheCrayT90,withahugeadvantage with regard to price and power consumption. The T90 was using ECL technology and achieved 1.8 GFlops peak performance. The SX-4 had a lower frequency because of CMOS, but because of the higher density of the LSIs it featured a higher level of parallelism on the CPU, leading to 2 GFlops per CPU with up to 32 CPUs. The memory architecture to support those was a major leap ahead. On this system one could typically reach an efficiency of 20-40% of peak performance measured throughout complete applications, far from what is achieved on standard components today. With this system NEC also implemented the first genera- tion of its own proprietary interconnect, the so-called IXS, which
  • 26. 26 allowed connecting several of these strong nodes into one highly powerful system. The IXS featured advanced functions like a global barrier and some hardware fault-tolerance, things which other companies tackled years later. With the SX-5 this direction was continued, it was a very suc- cessful system pushing the limits of CMOS-based CPUs and mul- ti-node configurations. At that time slowly the efforts on the us- ers’ side to parallelize their codes using “message passing”, PVM and MPI at that time, started to surface, so the need for a huge shared memory system slowly disappeared. Still there were quite some important codes in the scene which were only parallel us- ing a shared-memory paradigm, initially “microtasking”, which was vendor-specific, “autotasking”, NEC and some other vendors could automatically parallelize code, and later the standard OpenMP. So there was still sufficient need for a huge a highly ca- pable shared-memory-system. But later it became clear that more and more codes could utilize distributed memory systems, and moreover the memory-archi- tecture of huge shared memory systems became increasingly complex and that way costly. A memory is made of “banks”, in- dividual pieces of memory which are used in parallel. In order to provide the data at the necessary speed for a CPU a certain minimum amount of banks per CPU is required, because a bank needs some cycles to recover after every access. Consequently the number of banks had to scale with the number of CPUs, so that the complexity of the network connecting CPUs and banks was growing with the square of the number of CPUs. And if the CPUs became stronger they needed more banks, which made it even worse. So this concept became too expensive. Vector Supercomputing History of the SX Series
  • 27. 27 At the same time NEC had a lot of experience in mixing MPI- and OpenMP parallelization in applications, and it turned out that for the very fast CPUs the shared-memory parallelization was typi- cally not really efficient for more than four or eight threads. So the concept was adapted accordingly, which lead to the SX-6 and the famous Earth-Simulator, which kept the #1-spot of the Top500 for some years. The Earth Simulator was a different machine, but based on the same technology level and architecture as the SX-6. Thesesystemswerethefirstimplementationeverofavector-CPU on one single chip. The CPU by itself was not the challenge, but the interface to the memory to support the bandwidth needed for this chip, basically the “pins”. This kind of packaging tech- no-logy was always a strong point in NEC’s technology portfolio. TheSX-7wasasomewhatspecialmachine,itwasbasedontheSX-6 technology level, but featured 32 CPUs on one shared memory. The SX-8 then was a new implementation again with some ad- vanced features which directly addressed the necessities of cer- tain code-structures frequently encountered in applications. The communication between the world-wide organization of appli- cation-analysts and the hard- and software developers in Japan hadprovenveryfruitful,theSX-8wasverysuccessfulinthemarket. With the SX-9 the number of CPUs per node was pushed up again to 16 because of certain demands in the market. Just as an in- dication for the complexity of the memory architecture: the system had 16384 banks! The peak-performance of one node was 16 TFlops. Keep in mind that the efficiency achieved on real applications was still in the range of 10% in bad cases, up to 20% or even more, depending on the code. One should not compare this peak perfor- mance with some standard chip today, this would be misleading. The Current Product: SX-ACE In November 2014 NEC announced the next generation of the SX-vector-product, the SX-ACE. Some systems are already in- stalled and accepted, also in Europe. The main design target was the reduction of power-consump- tion per sustained performance, and measurements have prov- en this has been achieved. In order to do so and in line with the observation that the shared-memory parallelization with vector CPUs will be useful on four cores, perhaps eight, but normally not beyond that, the individual node consists of a single-chip- four core-processor, which inherits memory-controllers and the interface to the communication interconnect, a next generation of the proprietary IXS. That way a node is a small board, leading to a very compact de- sign. The complicated network between the memory and the CPUs, which took about 70% of the power-consumption of an SX- 9, was eliminated. And naturally a new generation of LSI-and PCB technology also leads to reductions. With the memory bandwidth of 256 GB/s this machine realizes one “byte per flop”, a measure of the balance between the pure compute power and the memory bandwidth of a chip. In compar- ison nowadays typical standard processors have in the order of one eights of this!
  • 28. 28 Memory bandwidth is increasingly a problem, this is known in the market as the “memory wall”, and is a hot topic since more than a decade. Standard components are built for a different market, where memory bandwidth is not as important as it is for HPC applications. Two of these node boards are put into a mod- ule, eight such modules are combined in one cage, and four of the cages, i.e. 64 nodes, are contained in one rack. The standard configuration of 512 nodes consists of eight racks. Vector Supercomputing History of the SX Series
  • 29. 29 NEC already installed several SX-ACE systems in Europe, and more installations are ongoing. The advantages of the system, the superior memory bandwidth, the strong CPU with good effi- ciency, will provide the scientists with performance levels on cer- tain codes which otherwise could not be achieved. Moreover the system has proven to be superior in terms of power consumption per sustained application performance, and this is a real hot top- ic nowadays.
  • 30. S EW SE SW tech BOX do i = 1, n a(i) = b(i) + c(i) end do 30 Vector Supercomputing History of the SX Series LSI technologies have improved drastically during the last two decades, but the ever increasing power consumption of systems now poses serious limitations. Tasks like fetching in- structions, decoding them or configuring paths on the chip consume energy, so it makes sense to execute these activities not for one operation only, but for a set of such operations. This is meant by SIMD. Data Parallelism The underlying structure to enable the usage of SIMD-instruc- tions is called “data parallelism”. For example the following loop is data-parallel, the individual loop iterations could be executed in any arbitrary order, and the results would stay the same: The complete independence of the loop iterations allows executing theminparallel,andthisismappedtohardwarewithSIMDsupport. Naturally one could also use other means to parallelize such a loop, e.g. for a very long loop one could execute chunks of loop iterations on different processors of a shared-memory machine using OpenMP. But this will cause overhead, so it will not be effi- cient for small trip counts, while a SIMD feature in hardware can deal with such a fine granularity. As opposed to the example above the following loop, a recursion, is not data parallel:
  • 31. do i = 1, n-1 a(i+1) = a(i) + c(i) end do 31 The time or number of cycles extends to the right hand side, assuming five stages (vertical direction), the red squares indi- cate which stage is busy at the given cycle. Obviously this is not efficient, because white squares indicate transistors which con- sume power without actively producing results. And the com- pute-performance is not optimal, it takes five cycles to compute one result. Similarly to the assembly line in the automotive industry this is what we want to achieve: After the “latency”, five cycles, just the length of the pipeline, we get a result every cycle. Obviously it is easy to feed the pipeline with sufficient independent input data in the case of different loop iterations of a data-parallel loop. If this is not possible one could also use different operations within the execution of a complicated loop body within one Obviously a change of the order of the loop iterations, for ex- ample the special case of starting with a loop-index of n-1 and decreasing it afterwards, will lead to different results. Therefore these iterations cannot be executed in parallel. Pipelining The following description is very much simplified, but explains the basic principles: Operations like add and multiply consist of so-called segments, which are executed one after the other in a so-called pipeline. While each segment acts on the data for some kind of intermediate step, only after the execution of the whole pipeline the final result is available. It is similar to an assembly-line in the automotive industry. Now, the segments do not need to deal with the same operation at a given point in time, they could be kept active if sufficient inputs are fed into the pipeline. Again, in the automotive industry the different stages of the assembly line are working on different cars, and nobody would wait for a car to be completed before starting the next one in the first stage of the production, that way keeping almost all stages inactive. Such a situation would be resembled by the following picture:
  • 32. 32 given loop iteration. But typically operations in such a loop body need results from previous operations of the same iteration, so one would rather arrive at a utilization of the pipeline like this: Some operations can be calculated almost in parallel, others have to wait for previous calculations to finish because they need input. In total the utilization of the pipeline is better but not optimal. This was the situation before the trend to SIMD computing was revitalized. Before that the advances of LSI technology always provided an increased performance with every new generation of product. But then the race for increasing frequency stopped, and now architectural enhancements are important again. Single Instruction Multiple Data (SIMD) Hardware realizations of SIMD have been present since the first Cray-1, but this was a vector architecture. Nowadays the instruc- tion sets SSE and AVX in their different generations and flavors use it, and similarly for example CUDA coding is also a way to uti- lize data parallelism. The “width” of the SIMD-instruction, the amount of loop itera- tions to execute in parallel by one instruction, is kept relatively small in the case of SSE (two for double precision) and AVX (four for double precision). The implementations of SIMD in stand- ard “scalar” processors typically involve add-and-multiply units which can produce several results per clock-cycle. They also fea- ture registers of the same “width” to keep several input parame- ters or several results. Vector Supercomputing History of the SX Series
  • 33. 33 iterations are independent, means if the loop is data parallel. It would not work if one computation would need a result of anoth- er computation as input while this is just being processed. In fact the SX-vector-machines always were combining both ideas, the vector-register and parallel functional units, so-called “pipe-sets”, four in this picture: There is another advantage of the vector architecture in the SX, and this proves to be very important. The scalar portion of the CPU is acting independently, and while the pipelines are kept busy with input from the vector registers, the inevitable scalar instructions which come with every loop can be executed. This involves checking of loop counters, updating of address pointers and the like. That way these instructions essentially do not con- sume time. The amount of cycles available to “hide” those is the length of the vector-registers divided by the number of pipe sets. In fact the white space in the picture above can partially be hid- den as well, because new vector instructions can be issued in ad- vance to immediately take over once the vector register with the input data for the previous instruction is completely processed. The instruction-issue-unit has the complete status information of all available resources, specifically pipes and vector registers. In a way the pipelines are “doubled” for SSE, but both are activat- ed by the same single instruction: Vector Computing The difference between those SIMD architectures and SIMD with a real vector architecture like the NEC SX-ACE, the only remaining vector system, lies in the existence of vector registers. In order to provide sufficient input to the functional units the vector registers have proven to be very efficient. In a way they are a very natural match to the idea of a pipeline – or assembly line. Again thinking about the automotive industry, there one would have a stock of components to constantly feed the assembly line. The following viewgraph resembles this situation: When an instruction is issued it takes some cycles for the first result to appear, but after that one would get a new result every cycle, until the vector register, which has a certain length, is completely processed. Naturally this can only work if the loop
  • 34.
  • 35. High Performance Computing (HPC) clusters serve a crucial role at research-intensive organizations in industries such as aerospace, meteorology, pharmaceuticals, and oil and gas exploration. The thing they have in common is a requirement for large-scale computations. Bright’s solution for HPC puts the power of supercomputing within the reach of just about any organization. Advanced Cluster Management Made Easy EngineeringLifeSciencesAutomotivePriceModellingAerospaceCAEDataAnalytics
  • 36. 36 The Bright Advantage Bright Cluster Manager for HPC makes it easy to build and oper- ate HPC clusters using the server and networking equipment of your choice, tying them together into a comprehensive, easy to manage solution. Bright Cluster Manager for HPC lets customers deploy complete clusters over bare metal and manage them effectively. It pro- vides single-pane-of-glass management for the hardware, the operating system, HPC software, and users. With Bright Cluster Manager for HPC, system administrators can get clusters up and running quickly and keep them running reliably throughout their life cycle – all with the ease and elegance of a fully featured, en- terprise-grade cluster manager. Bright Cluster Manager was designed by a group of highly expe- rienced cluster specialists who perceived the need for a funda- mental approach to the technical challenges posed by cluster management. The result is a scalable and flexible architecture, forming the basis for a cluster management solution that is ex- tremely easy to install and use, yet is suitable for the largest and most complex clusters. Scale clusters to thousands of nodes Bright Cluster Manager was designed to scale to thousands of nodes. It is not dependent on third-party (open source) software that was not designed for this level of scalability. Many advanced features for handling scalability and complexity are included: Management Daemon with Low CPU Overhead The cluster management daemon (CMDaemon) runs on every node in the cluster, yet has a very low CPU load – it will not cause any noticeable slow-down of the applications running on your cluster. Because the CMDaemon provides all cluster management func- tionality, the only additional daemon required is the workload manager (or queuing system) daemon. For example, no daemons Advanced Cluster Management Made Easy Easy-to-use, complete and scalable The cluster installer takes you through the installation process and offers advanced options such as “Express” and “Remote”.
  • 37. 37 for monitoring tools such as Ganglia or Nagios are required. This minimizes the overall load of cluster management software on your cluster. Multiple Load-Balancing Provisioning Nodes Node provisioning – the distribution of software from the head node to the regular nodes – can be a significant bottleneck in large clusters. With Bright Cluster Manager, provisioning capabil- ity can be off-loaded to regular nodes, ensuring scalability to a virtually unlimited number of nodes. When a node requests its software image from the head node, the head node checks which of the nodes with provisioning ca- pability has the lowest load and instructs the node to download or update its image from that provisioning node. Synchronized Cluster Management Daemons The CMDaemons are synchronized in time to execute tasks in exact unison. This minimizes the effect of operating system jit- ter (OS jitter) on parallel applications. OS jitter is the “noise” or “jitter” caused by daemon processes and asynchronous events such as interrupts. OS jitter cannot be totally prevented, but for parallel applications it is important that any OS jitter occurs at the same moment in time on every compute node. Built-In Redundancy Bright Cluster Manager supports redundant head nodes and re- dundant provisioning nodes that can take over from each other in case of failure. Both automatic and manual failover configura- tions are supported. Other redundancy features include support for hardware and software RAID on all types of nodes in the cluster. Diskless Nodes Bright Cluster Manager supports diskless nodes, which can be useful to increase overall Mean Time Between Failure (MTBF), particularly on large clusters. Nodes with only InfiniBand and no Ethernet connection are possible too. Bright Cluster Manager 7 for HPC For organizations that need to easily deploy and manage HPC clusters Bright Cluster Manager for HPC lets customers deploy complete clusters over bare metal and manage them effectively. It provides single-pane-of-glass management for the hardware, the operating system, the HPC software, and users. With Bright Cluster Manager for HPC, system administrators can quickly get clusters up and running and keep them running relia- bly throughout their life cycle – all with the ease and elegance of a fully featured, enterprise-grade cluster manager. Features Simple Deployment Process – Just answer a few questions about your cluster and Bright takes care of the rest. It installs all of the software you need and creates the necessary config- uration files for you, so you can relax and get your new cluster up and running right – first time, every time.
  • 38. S EW SE SW tech BOX 38 Advanced Cluster Management Made Easy Easy-to-use, complete and scalable Head Nodes Regular Nodes A cluster can have different types of nodes, but it will always have at least one head node and one or more “regular” nodes. A regular node is a node which is controlled by the head node and which receives its software image from the head node or from a dedicated provisioning node. Regular nodes come in different flavors. Some examples include: Failover Node – A node that can take over all functionalities of the head node when the head node becomes unavailable. Compute Node – A node which is primarily used for compu- tations. Login Node – A node which provides login access to users of the cluster. It is often also used for compilation and job sub- mission. I/O Node – A node which provides access to disk storage. Provisioning Node – A node from which other regular nodes can download their software image. Workload Management Node – A node that runs the central workload manager, also known as queuing system. Subnet Management Node – A node that runs an InfiniBand subnet manager. More types of nodes can easily be defined and configured with Bright Cluster Manager. In simple clusters there is only one type of regular node – the compute node – and the head node fulfills all cluster management roles that are described in the regular node types mentioned above.
  • 39. 39 All types of nodes (head nodes and regular nodes) run the same CMDaemon. However, depending on the role that has been as- signed to the node, the CMDaemon fulfills different tasks. This node has a ‘Login Role’, ‘Storage Role’ and ‘PBS Client Role’. The architecture of Bright Cluster Manager is implemented by the following key elements: The Cluster Management Daemon (CMDaemon) The Cluster Management Shell (CMSH) The Cluster Management GUI (CMGUI) Connection Diagram A Bright cluster is managed by the CMDaemon on the head node, which communicates with the CMDaemons on the other nodes over encrypted connections. The CMDaemon on the head node exposes an API which can also be accessed from outside the cluster. This is used by the cluster management Shell and the cluster management GUI, but it can also be used by other applications if they are programmed to ac- cess the API. The API documentation is available on request. The diagram below shows a cluster with one head node, three regular nodes and the CMDaemon running on each node. Elements of the Architecture
  • 40. 40 Can Install on Bare Metal – With Bright Cluster Manager there is nothing to pre-install. You can start with bare metal serv- ers, and we will install and configure everything you need. Go from pallet to production in less time than ever. Comprehensive Metrics and Monitoring – Bright Cluster Man- ager really shines when it comes to showing you what’s going on in your cluster. It’s beautiful graphical user interface lets you monitor resources and services across the entire cluster in real time. Powerful User Interfaces – With Bright Cluster Manager you don’t just get one great user interface, we give you two. That way, you can choose to provision, monitor, and manage your HPC clusters with a traditional command line interface or our beautiful, intuitive graphical user interface. OptimizetheUseofITResourcesbyHPCApplications–Bright ensures HPC applications get the resources they require according to the policies of your organization. Your cluster will prioritize workloads according to your business priorities. HPCToolsandLibrariesIncluded–EverycopyofBrightCluster Manager comes with a complete set of HPC tools and libraries so you will be ready to develop, debug, and deploy your HPC code right away. Advanced Cluster Management Made Easy Easy-to-use, complete and scalable “The building blocks for transtec HPC solutions must be chosen according to our goals ease-of-management and ease-of-use. With Bright Cluster Manager, we are happy to have the technology leader at hand, meeting these requirements, and our customers value that.” Jörg Walter HPC Solution Engineer Monitor your entire cluster at an glance with Bright Cluster Manager
  • 41. 41 Quickly build and deploy an HPC cluster When you need to get an HPC cluster up and running, don’t waste time cobbling together a solution from various open source tools. Bright’s solution comes with everything you need to set up a complete cluster from bare metal and manage it with a beauti- ful, powerful user interface. Whether your cluster is on-premise or in the cloud, Bright is a virtual one-stop-shop for your HPC project. Build a cluster in the cloud Bright’s dynamic cloud provisioning capability lets you build an entire cluster in the cloud or expand your physical cluster into the cloud for extra capacity. Bright allocates the compute re- sources in Amazon Web Services automatically, and on demand. Build a hybrid HPC/Hadoop cluster Does your organization need HPC clusters for technical comput- ing and Hadoop clusters for Big Data? The Bright Cluster Man- ager for Apache Hadoop add-on enables you to easily build and manage both types of clusters from a single pane of glass. Your system administrators can monitor all cluster operations, man- age users, and repurpose servers with ease. Driven by research Bright has been building enterprise-grade cluster management software for more than a decade. Our solutions are deployed in thousands of locations around the globe. Why reinvent the wheel when you can rely on our experts to help you implement the ideal cluster management solution for your organization. Clusters in the cloud Bright Cluster Manager can provision and manage clusters that are running on virtual servers inside the AWS cloud as if they were local machines. Use this feature to build an entire cluster in AWS from scratch or extend a physical cluster into the cloud when you need extra capacity. Use native services for storage in AWS – Bright Cluster Manager lets you take advantage of inexpensive, secure, durable, flexible and simple storage services for data use, archiving, and backup in the AWS cloud. Make smart use of AWS for HPC – Bright Cluster Manager can save you money by instantiating compute resources in AWS only when they are needed. It uses built-in intelligence to create in- stances only after the data is ready for processing and the back- log in on-site workloads requires it. You decide which metrics to monitor. Just drag a component into the display area an Bright Cluster Manager instantly creates a graph of the data you need.
  • 42. BOX 42 Advanced Cluster Management Made Easy Easy-to-use, complete and scalable High performance meets efficiency Initially, massively parallel systems constitute a challenge to both administrators and users. They are complex beasts. Anyone building HPC clusters will need to tame the beast, master the complexity and present users and administrators with an easy- to-use, easy-to-manage system landscape. Leading HPC solution providers such as transtec achieve this goal. They hide the complexity of HPC under the hood and match high performance with efficiency and ease-of-use for both users and administrators. The “P” in “HPC” gains a double meaning: “Performance” plus “Productivity”. Cluster and workload management software like Moab HPC Suite, Bright Cluster Manager or Intel Fabric Suite provide the means to master and hide the inherent complexity of HPC sys- tems. For administrators and users, HPC clusters are presented as single, large machines, with many different tuning parame- ters. The software also provides a unified view of existing clus- ters whenever unified management is added as a requirement by the customer at any point in time after the first installation. Thus, daily routine tasks such as job management, user management, queue partitioning and management, can be performed easily with either graphical or web-based tools, without any advanced scripting skills or technical expertise required from the adminis- trator or user.
  • 43. S EW SE SW tech BOX 43 Bright Cluster Manager unleashes and manages the unlimited power of the cloud Create a complete cluster in Amazon EC2, or easily extend your onsite cluster into the cloud. Bright Cluster Manager provides a “single pane of glass” to both your on-premise and cloud resourc- es, enabling you to dynamically add capacity and manage cloud nodes as part of your onsite cluster. Cloud utilization can be achieved in just a few mouse clicks, with- out the need for expert knowledge of Linux or cloud computing. With Bright Cluster Manager, every cluster is cloud-ready, at no extra cost. The same powerful cluster provisioning, monitoring, scheduling and management capabilities that Bright Cluster Manager provides to onsite clusters extend into the cloud. The Bright advantage for cloud utilization Ease of use: Intuitive GUI virtually eliminates user learning curve; no need to understand Linux or EC2 to manage system. Alternatively, cluster management shell provides powerful scripting capabilities to automate tasks. Complete management solution: Installation/initialization, provisioning, monitoring, scheduling and management in one integrated environment. Integrated workload management: Wide selection of work- load managers included and automatically configured with local, cloud and mixed queues. Single pane of glass; complete visibility and control: Cloud computenodesmanagedaselementsoftheon-sitecluster;vi- sible from a single console with drill-downs and historic data. Efficient data management via data-aware scheduling: Auto- matically ensures data is in position at start of computation; delivers results back when complete. Secure, automatic gateway: Mirrored LDAP and DNS services inside Amazon VPC, connecting local and cloud-based nodes for secure communication over VPN. Cost savings: More efficient use of cloud resources; support for spot instances, minimal user intervention. Two cloud scenarios Bright Cluster Manager supports two cloud utilization scenarios: “Cluster-on-Demand” and “Cluster Extension”. Scenario 1: Cluster-on-Demand The Cluster-on-Demand scenario is ideal if you do not have a clus- ter onsite, or need to set up a totally separate cluster. With just a few mouse clicks you can instantly create a complete cluster in the public cloud, for any duration of time.
  • 44. 44 Advanced Cluster Management Made Easy Cloud Bursting With Bright Scenario 2: Cluster Extension The Cluster Extension scenario is ideal if you have a cluster onsite but you need more compute power, including GPUs. With Bright Cluster Manager, you can instantly add EC2-based resources to your onsite cluster, for any duration of time. Extending into the cloud is as easy as adding nodes to an onsite cluster – there are only a few additional, one-time steps after providing the public cloud account information to Bright Cluster Manager. The Bright approach to managing and monitoring a cluster in the cloud provides complete uniformity, as cloud nodes are managed and monitored the same way as local nodes: Load-balanced provisioning Software image management Integrated workload management Interfaces – GUI, shell, user portal Monitoring and health checking Compilers, debuggers, MPI libraries, mathematical libraries and environment modules
  • 45. Bright Computing 45 Bright Cluster Manager also provides additional features that are unique to the cloud. Amazon spot instance support – Bright Cluster Manager enables users to take advantage of the cost savings offered by Amazon’s spot instances. Users can specify the use of spot instances, and Bright will automatically schedule as available, reducing the cost to compute without the need to monitor spot prices and sched- ule manually. Hardware Virtual Machine (HVM) – Bright Cluster Manager auto- matically initializes all Amazon instance types, including Cluster Compute and Cluster GPU instances that rely on HVM virtualization. Amazon VPC – Bright supports Amazon VPC setups which allows compute nodes in EC2 to be placed in an isolated network, there- by separating them from the outside world. It is even possible to route part of a local corporate IP network to a VPC subnet in EC2, so that local nodes and nodes in EC2 can communicate easily. Data-aware scheduling – Data-aware scheduling ensures that input data is automatically transferred to the cloud and made accessible just prior to the job starting, and that the output data is automatically transferred back. There is no need to wait for the data to load prior to submitting jobs (delaying entry into the job queue), nor any risk of starting the job before the data transfer is complete (crashing the job). Users submit their jobs, and Bright’s data-aware scheduling does the rest. Bright Cluster Manager provides the choice: “To Cloud, or Not to Cloud” Not all workloads are suitable for the cloud. The ideal situation for most organizations is to have the ability to choose between onsite clusters for jobs that require low latency communication, complex I/O or sensitive data; and cloud clusters for many other types of jobs. Bright Cluster Manager delivers the best of both worlds: a pow- erful management solution for local and cloud clusters, with the ability to easily extend local clusters into the cloud without compromising provisioning, monitoring or managing the cloud resources. One or more cloud nodes can be configured under the Cloud Settings tab.
  • 46.
  • 47. While all HPC systems face challenges in workload demand, workflow constraints, resource complexity, and system scale, enterprise HPC systems face more stringent challenges and expectations. Enterprise HPC systems must meet mission-critical and priority HPC workload demands for commercial businesses, business-oriented research, and academic organizations Intelligent HPC Workload Management 47 HighTrouhputComputingCADBigDataAnalyticsSimulationAerospaceAutomotive
  • 48. 48 Solutions to Enterprise High-Performance Computing Challenges The Enterprise has requirements for dynamic scheduling, provi- sioning and management of multi-step/multi-application servic- es across HPC, cloud, and big data environments. Their workloads directly impact revenue, product delivery, and organizational ob- jectives. Enterprise HPC systems must process intensive simulation and data analysis more rapidly, accurately and cost-effectively to accelerate insights. By improving workflow and eliminating job delays and failures, the business can more efficiently leverage data to make data-driven decisions and achieve a competitive advantage. The Enterprise is also seeking to improve resource utilization and manage efficiency across multiple heterogene- ous systems. User productivity must increase to speed the time to discovery, making it easier to access and use HPC resources and expand to other resources as workloads demand. Enterprise-Ready HPC Workload Management Moab HPC Suite – Enterprise Edition accelerates insights by uni- fying data center resources, optimizing the analysis process, and guaranteeing services to the business. Moab 8.1 for HPC systems and HPC cloud continually meets enterprise priorities through increased productivity, automated workload uptime, and consistent SLAs. It uses the battle-tested and patented Moab intelligence engine to automatically balance the complex, mis- sion-critical workload priorities of enterprise HPC systems. En- terprise customers benefit from a single integrated product that brings together key Enterprise HPC capabilities, implementation services, and 24/7 support. It is imperative to automate workload workflows to speed time to discovery. The following new use cases play a key role in im- proving overall system performance: Intelligent HPC Workload Management Moab HPC Suite – Enterprise Edition Moab is the “brain” of an HPC system, intelligently optimizing workload throughput while balancing service levels and priorities.
  • 49. S EW SE SW tech BOX 49 Intelligent HPC Workload Management Across Infrastructure and Organizational Complexity Moab HPC Suite – Enterprise Edition acts as the “brain” of a high performance computing (HPC) system to accelerate and auto- mate complex decision making processes. The patented Moab in- telligence engine provides multi-dimensional policies that mimic real-world decision making. These multi-dimensional, real-world policies are essential in scheduling workload to optimize job speed, job success and re- sourceutilization.MoabHPCSuite–EnterpriseEditionintegrates decision-making data from and automates actions through your system’s existing mix of resource managers. This enables all the dimensions of real-time granular resource attributes, resource state, as well as current and future resource commitments to be modeled against workload/application re- quirements. All of these dimensions are then factored into the most efficient scheduling decisions. Moab also dramatically sim- plifies the management tasks and processes across these com- plex, heterogeneous environments. Moab works with many of the major resource management and industry standard resource monitoring tools and covers mixed hardware, network, storage and licenses. Moab HPC Suite – Enterprise Edition multi-dimensional policies are also able to factor in organizational priorities and complex- ities when scheduling workload. Moab ensures workload is processed according to organizational priorities and commit- ments and that resources are shared fairly across users, groups and even multiple organizations. This enables organizations to automaticallyenforceserviceguaranteesandeffectivelymanage organizational complexities with simple policy-based settings. Moab HPC Suite – Enterprise Edition is architected to integrate on top of your existing resource managers and other types of management tools in your environment to provide policy-based scheduling and management of workloads. It makes the optimal complex decisions based on all of the data it integrates from the various resource managers and then orchestrates the job and management actions through those resource managers. Moab’s flexible management integration makes it the ideal choice to integrate with existing or new systems as well as to manage your high performance computing system as it grows and expands in the future.
  • 50. 50 Elastic Computing – unifies data center resources by assisting the business management resource expansion through burst- ing to private/public clouds and other data center resources utilizing OpenStack as a common platform Performance and Scale – optimizes the analysis process by dramatically increasing TORQUE and Moab throughput and scalability across the board Tighter cooperation between Moab and Torque – harmoniz- ing these key structures to reduce overhead and improve communication between the HPC scheduler and the re- source manager(s) More Parallelization – increasing the decoupling of Moab and TORQUE’s network communication so Moab is less dependent on TORQUE’S responsiveness, resulting in a 2x speed improvement and shortening the duration of the av- erage scheduling iteration in half Accounting Improvements – allowing toggling between multiple modes of accounting with varying levels of en- forcement, providing greater flexibility beyond strict allo- cation options. Additionally, new up-to-date accounting balances provide real-time insights into usage tracking. Viewpoint Admin Portal – guarantees services to the busi- ness by allowing admins to simplify administrative reporting, workload status tracking, and job resource viewing Accelerate Insights Moab HPC Suite – Enterprise Edition accelerates insights by in- creasing overall system, user and administrator productivity, achieving more accurate results that are delivered faster and at a lower cost from HPC resources. Moab provides the scalability, 90-99 percent utilization, and simple job submission required to maximize productivity. This ultimately speeds the time to discov- ery, aiding the enterprise in achieving a competitive advantage due to its HPC system. Enterprise use cases and capabilities in- clude the following: Intelligent HPC Workload Management Moab HPC Suite – Enterprise Edition “With Moab HPC Suite, we can meet very demanding customers’ requirements as regards unified management of heterogeneous cluster environments, grid management, and provide them with flex- ible and powerful configuration and reporting options. Our custom- ers value that highly.” Thomas Gebert HPC Solution Architect
  • 51. 51 OpenStack integration to offer virtual and physical resource provisioning for IaaS and PaaS Performance boost to achieve 3x improvement in overall opti- mization performance Advanced workflow data staging to enable improved cluster utilization, multiple transfer methods, and new transfer types Advanced power management with clock frequency control and additional power state options, reducing energy costs by 15-30 percent Workload-optimized allocation policies and provisioning to get more results out of existing heterogeneous resources and reduce costs, including topology-based allocation Unify workload management across heterogeneous clusters by managing them as one cluster, maximizing resource availa- bility and administration efficiency Optimized, intelligent scheduling packs workloads and back- fills around priority jobs and reservations while balancing SLAs to efficiently use all available resources Optimized scheduling and management of accelerators, both Intel Xeon Phi and GPGPUs, for jobs to maximize their utiliza- tion and effectiveness Simplified job submission and management with advanced job arrays and templates Showback or chargeback for pay-for-use so actual resource usage is tracked with flexible chargeback rates and reporting by user, department, cost center, or cluster Multi-cluster grid capabilities manage and share workload across multiple remote clusters to meet growing workload demand or surges Guarantee Services to the Business Job and resource failures in enterprise HPC systems lead to de- layed results, missed organizational opportunities, and missed objectives. Moab HPC Suite – Enterprise Edition intelligently automates workload and resource uptime in the HPC system to ensure that workloads complete successfully and reliably, avoid failures, and guarantee services are delivered to the business.
  • 52. 52 The Enterprise benefits from these features: Intelligent resource placement prevents job failures with granular resource modeling, meeting workload requirements and avoiding at-risk resources Auto-response to failures and events with configurable ac- tions to pre-failure conditions, amber alerts, or other metrics and monitors Workload-aware future maintenance scheduling that helps maintain a stable HPC system without disrupting workload productivity Real-world expertise for fast time-to-value and system uptime with included implementation, training and 24/7 support re- mote services Auto SLA Enforcement Moab HPC Suite – Enterprise Edition uses the powerful Moab in- telligence engine to optimally schedule and dynamically adjust workload to consistently meet service level agreements (SLAs), guarantees, or business priorities. This automatically ensures that the right workloads are completed at the optimal times, taking into account the complex number of using departments, priorities, and SLAs to be balanced. Moab provides the following benefits: Usage accounting and budget enforcement that schedules resources and reports on usage in line with resource sharing agreements and precise budgets (includes usage limits, usage reports, auto budget management, and dynamic fair share policies) SLA and priority policies that make sure the highest priority workloads are processed first (includes Quality of Service and hierarchical priority weighting) Continuous plus future scheduling that ensures priorities and guarantees are proactively met as conditions and workload changes (i.e. future reservations, pre-emption) Intelligent HPC Workload Management Moab HPC Suite – Grid Option
  • 53. 53 Grid- and Cloud-Ready Workload Management The benefits of a traditional HPC environment can be extended to more efficiently manage and meet workload demand with the multi-cluster grid and HPC cloud management capabilities in Moab HPC Suite – Enterprise Edition. It allows you to: Self-service job submission portal enables users to easily sub- mit and track HPC cloud jobs at any time from any location with little training required Pay-for-use show back or chargeback so actual resource usage is tracked with flexible chargeback rates reporting by user, de- partment, cost center, or cluster On-demand workload-optimized node provisioning dynami- cally provisions nodes based on policies, to better meet work- load needs. Manage and share workload across multiple remote clusters to meet growing workload demand or surges by adding on the Moab HPC Suite – Grid Option.
  • 54. S EW SE SW tech BOX 54 Choose the Moab HPC Suite That’s Right for You This following chart provides a high-level comparison of the dif- ference in key use cases and capabilities that are enabled be- tween the different editions of Moab HPC Suite to intelligently manage and optimize HPC workload efficiency. Customers can use this chart to help determine which edition will best address their HPC workload management challenges, maximize their workload and resource productivity, and help them meet their service level performance goals. To explore any use cases or ca- pability compared in the chart further, see the HPC Workload Management Use Cases. Basic Edition is best for static, non-dynamic basic clusters of all sizes who want to optimize workload throughput and uti- lization while balancing priorities Enterprise Edition is best for large clusters with dynamic, mission-critical workloads and complex service levels and priorities to balance. It is also for organizations who want to integrate or who require HPC Cloud capabilities for their HPC cluster systems. Enterprise Edition provides additional uses cases these organizations need such as: Accounting for us- age of cluster resources for usage budgeting and pay-for-use showback or chargeback Workload optimized node provisioning Heterogeneous, multi-cluster workload management Workload-aware power management Intelligent HPC Workload Management Optimizing Accelerators with Moab HPC Suit
  • 55. 55 Application Portal Edition is suited for customers who need an application-centric job submission experience to simplify and extend high performance computing to more users without re- quiring specialized HPC training. It provides all the benefits of the Enterprise Edition plus meeting these requirements with a technical computing application portal that supports most common applications in manufacturing, oil gas, life sciences, academic, government and other industries. Remote Visualization Edition is suited for customers with many users, internally or external to their organization, of 3D/CAE/CAD and other visualization applications who want to reduce the hardware, networking and management costs of running their visualization workloads. This edition also provides the added benefits of improved workforce produc- tivity and collaboration while optimizing the utilization and guaranteeing shared resource access for users moving from technical workstations to more efficient shared resources in a technical compute cloud. It provides all the benefits of the Enterprise Edition plus providing these additional capabili- ties and benefits.
  • 56. BOX 56 transtec HPC solutions are designed for maximum flexibility and ease of management. We not only offer our customers the most powerful and flexible cluster management solution out there, but also provide them with customized setup and sitespecific configuration. Whether a customer needs a dynam- ical Linux-Windows dual-boot solution, unifi ed management of different clusters at different sites, or the fi ne-tuning of the Moab scheduler for implementing fi ne-grained policy confi guration – transtec not only gives you the framework at hand, but also helps you adapt the system according to your special needs. Needless to say, when customers are in need of special trainings, transtec will be there to provide customers, adminis- trators, or users with specially adjusted Educational Services. Having a many years’ experience in High Performance Comput- ing enabled us to develop effi cient concepts for installing and deploying HPC clusters. For this, we leverage well-proven 3rd party tools, which we assemble to a total solution and adapt and confi gure according to the customer’s requirements. We manage everything from the complete installation and configuration of the operating system, necessary middleware components like job and cluster management systems up to the customer’s applications. Intelligent HPC Workload Management Optimizing Accelerators with Moab HPC Suit
  • 57. 57 Moab HPC Suite Use Case Basic Enterprise Application Portal Remote Visualization ProductivityAcceleration Massive multifaceted scalability x x x x Workload optimized resource allocation x x x x Optimized, intelligent scheduling of work- load and resources x x x x Optimized accelerator scheduling manage- ment (Intel MIC GPGPU) x x x x Simplified user job submission mgmt. - Application Portal x x x x x Administrator dashboard and management reporting x x x x Unified workload management across het- erogeneous cluster(s) x Single cluster only x x x Workload optimized node provisioning x x x Visualization Workload Optimization x Workload-aware auto-power management x x x Uptime Automation Intelligent resource placement for failure prevention x x x x Workload-aware maintenance scheduling x x x x Basic deployment, training and premium (24x7) support service Standard support only x x x Auto response to failures and events x Limited no re-boot/provision x x x Auto SLA enforcement Resource sharing and usage policies x x x x SLA and oriority policies x x x x Continuous and future reservations scheduling x x x x Usage accounting and budget enforcement x x x Grid- and Cloud-Ready Manage and share workload across multiple clusters in wich area grid x w/grid Option x w/grid Option x w/grid Option x w/grid Option User self-service: job request management portal x x x x Pay-for-use showback and chargeback x x x Workload optimized node provisioning re-purposing x x x Multiple clusters in single domain
  • 58. 58 Intelligent HPC Workload Management Moab HPC Suite – Application Portal Edition Moab HPC Suite – Application Portal Edition Moab HPC Suite – Application Portal Edition provides easy sin- gle-pointaccesstocommontechnicalapplications,data,resourc- es,andjobsubmissionstoaccelerateprojectcompletionandcolla- borationforenduserswhilereducingcomputingcosts.Itenables more of your internal staff and partners to collaborate from any- where using HPC applications and resources without any special- ized HPC training while your intellectual property stays secure. The optimization policies Moab provides significantly reduces the costs and complexity of keeping up with compute demand by optimizing resource sharing, utilization and higher through- put while still balancing different service levels and priorities. Remove the Barriers to Harnessing HPC Organizations of all sizes and types are looking to harness the potential of high performance computing (HPC) to enable and accelerate more competitive product design and research. To do this, you must find a way to extend the power of HPC resources to designers, researchers and engineers not familiar with or trained on using the specialized HPC technologies. Simplifying the user access to and interaction with applications running on more efficient HPC clusters removes the barriers to discovery and in- novation for all types of departments and organizations. Moab HPC Suite – Application Portal Edition provides easy single-point access to common technical applications, data, HPC resources, and job submissions to accelerate research analysis and collab- oration process for end users while reducing computing costs. HPC Ease of Use Drives Productivity Moab HPC Suite – Application Portal Edition improves ease of use and productivity for end users using HPC resources with sin- gle-point access to their common technical applications, data, resources, and job submissions across the design and research process. It simplifies the specification and running of jobs on HPC resources to a few simple clicks on an application specific tem- plate with key capabilities including: