SlideShare una empresa de Scribd logo
1 de 12
COMPUTER CLUSTER
Introduction
A computer cluster is a group of loosely coupled computers that work together closely
so that in many respects they can be viewed as though they are a single computer. The
components of a cluster are commonly, but not always, connected to each other through
fast local area networks. Clusters are usually deployed to improve performance and/or
reliability over that provided by a single computer, while typically being much more cost-
effective than single computers of comparable speed or reliability.
Cluster categorizations
High-availability (HA) clusters
High-availability clusters are implemented primarily for the purpose of improving the
availability of services which the cluster provides. They operate by having redundant
nodes, which are then used to provide service when system components fail. The most
common size for an HA cluster is two nodes, which is the minimum requirement to
provide redundancy. HA cluster implementations attempt to manage the redundancy
inherent in a cluster to eliminate single points of failure. There are many commercial
implementations of High-Availability clusters for many operating systems. The Linux-
HA project is one commonly used free software HA package for the Linux OS.
Load-balancing clusters
Load-balancing clusters operate by having all workload come through one or more load-
balancing front ends, which then distribute it to a collection of back end servers.
Although they are primarily implemented for improved performance, they commonly
include high-availability features as well. Such a cluster of computers is sometimes
referred to as a server farm. There are many commercial load balancers available
including Platform LSF HPC, Sun Grid Engine, Moab Cluster Suite and Maui Cluster
1
Scheduler. The Linux Virtual Server project provides one commonly used free software
package for the Linux OS.
High-performance (HPC) clusters
High-performance clusters are implemented primarily to provide increased performance
by splitting a computational task across many different nodes in the cluster, and are most
commonly used in scientific computing. One of the most popular HPC implementations
is a cluster with nodes running Linux as the OS and free software to implement the
parallelism. This configuration is often referred to as a Beowulf cluster. Such clusters
commonly run custom programs which have been designed to exploit the parallelism
available on HPC clusters. Many such programs use libraries such as MPI which are
specially designed for writing scientific applications for HPC computers.
HPC clusters are optimized for workloads which require jobs or processes happening on
the separate cluster computer nodes to communicate actively during the computation.
These include computations where intermediate results from one node's calculations will
affect future calculations on other nodes.
Grid computing
Grid computing or grid clusters are a technology closely related to cluster computing.
The key differences between grids and traditional clusters are that grids connect
collections of computers which do not fully trust each other, and hence operate more like
a computing utility than like a single computer. In addition, grids typically support more
heterogeneous collections than are commonly supported in clusters.
Grid computing is optimized for workloads which consist of many independent jobs or
packets of work, which do not have to share data between the jobs during the
computation process. Grids serve to manage the allocation of jobs to computers which
will perform the work independently of the rest of the grid cluster. Resources such as
storage may be shared by all the nodes, but intermediate results of one job do not affect
other jobs in progress on other nodes of the grid.
2
High-performance cluster implementations
The TOP500 organization's semiannual list of the 500 fastest computers usually includes
many clusters. TOP500 is a collaboration between the University of Mannheim, the
University of Tennessee, and the National Energy Research Scientific Computing Center
at Lawrence Berkeley National Laboratory. As of August 2006, the top supercomputer is
the Department of Energy's BlueGene/L system with performance of 280.6 TFlops. The
second place is owned by another BlueGene/L system with performance of 91.29 TFlops.
Clustering can provide significant performance benefits versus price. The System X
supercomputer at Virginia Tech, the 28th most powerful supercomputer on Earth as of
June 2006, is a 12.25 TFlops computer cluster of 1100 Apple XServe G5 2.3 GHz dual-
processor machines (4 GB RAM, 80 GB SATA HD) running Mac OS X. The cluster
initially consisted of Power Mac G5s; the rack-mountable XServes are denser than
desktop Macs, reducing the aggregate size of the cluster. The total cost of the previous
Power Mac system was $5.2 million, a tenth of the cost of slower mainframe
supercomputers. (The Power Mac G5s were sold off.)
The central concept of a Beowulf cluster is the use of commercial off-the-shelf computers
to produce a cost-effective alternative to a traditional supercomputer. One project that
took this to an extreme was the Stone Soupercomputer.
John Koza has the largest computer cluster owned by an individual.
The SETI@home project may be the largest distributed cluster in existence. It uses
approximately three million home computers all over the world to analyze data from the
Arecibo Observatory radiotelescope, searching for evidence of extraterrestrial
intelligence.
3
Cluster history
The history of cluster computing is best captured by a footnote in Greg Pfister's In
Search of Clusters: "Virtually every press release from DEC mentioning clusters says
'DEC, who invented clusters...'. IBM didn't invent them either. Customers invented
clusters, as soon as they couldn't fit all their work on one computer, or needed a backup.
The date of the first is unknown, but I'd be surprised if it wasn't in the 1960's, or even late
1950's."
The formal engineering basis of cluster computing as a means of doing parallel work of
any sort was arguably invented by Gene Amdahl of IBM, who in 1967 published what
has come to be regarded as the seminal paper on parallel processing: Amdahl's Law.
Amdahl's Law describes mathematically the speedup one can expect from parallelizing
any given otherwise serially performed task on a parallel architecture. This article defined
the engineering basis for both multiprocessor computing and cluster computing, where
the primary differentiator is whether or not the interprocessor communications are
supported "inside" the computer (on for example a customized internal communications
bus or network) or "outside" the computer on a commodity network.
Consequently the history of early computer clusters is more or less directly tied into the
history of early networks, as one of the primary motivation for the development of a
network was to link computing resources, creating a de facto computer cluster. Packet
switching networks were conceptually invented by the RAND corporation in 1962. Using
the concept of a packet switched network, the ARPANET project succeeded in creating
in 1969 what was arguably the world's first commodity-network based computer cluster
by linking four different computer centers (each of which was something of a "cluster" in
its own right, but probably not a commodity cluster). The ARPANET project grew into
the Internet -- which can be thought of as "the mother of all computer clusters" (as the
union of nearly all of the compute resources, including clusters, that happen to be
connected). It also established the paradigm in use by all computer clusters in the world
4
today -- the use of packet-switched networks to perform interprocessor communications
between processor (sets) located in otherwise disconnected frames.
The development of customer-built and research clusters proceded hand in hand with that
of both networks and the Unix operating system from the early 1970s, as both TCP/IP
and the Xerox PARC project created and formalized protocols for network-based
communications. The Hydra operating system was built for a cluster of DEC PDP-11
minicomputers called C.mmp at C-MU in 1971. However, it wasn't until circa 1983 that
the protocols and tools for easily doing remote job distribution and file sharing were
defined (largely within the context of BSD Unix, as implemented by Sun Microsystems)
and hence became generally available in commercially, along with a shared filesystem.
The first commercial clustering product was ARCnet, developed by Datapoint in 1977.
ARCnet wasn't a commercial success and clustering per se didn't really take off until
DEC released their VAXcluster product in the 1984 for the VAX/VMS operating system.
The ARCnet and VAXcluster products not only supported parallel computing, but also
shared file systems and peripheral devices. They were supposed to give you the
advantage of parallel processing, while maintaining data reliability and uniqueness.
VAXcluster, now VMScluster, is still available on OpenVMS systems from HP running
on Alpha and Itanium systems.
Two other noteworthy early commercial clusters were the Tandem Himalaya (a circa
1994 high-availability product) and the IBM S/390 Parallel Sysplex (also circa 1994,
primarily for business use).
No history of commodity compute clusters would be complete without noting the pivotal
role played by the development of Parallel Virtual Machine (PVM) software in 1989.
This open source software based on TCP/IP communications enabled the instant creation
of a virtual supercomputer -- a high performance compute cluster -- made out of any
TCP/IP connected systems. Free form heterogeneous clusters built on top of this model
rapidly achieved total throughput in FLOPS that greatly exceeded that available even
with the most expensive "big iron" supercomputers. PVM and the advent of inexpensive
5
networked PC's led, in1993, to a NASA project to build supercomputers out of
commodity clusters. In 1995 the invention of the "beowulf"-style cluster -- a compute
cluster built on top of a commodity network for the specific purpose of "being a
supercomputer" capable of performing tightly coupled parallel HPC computations. This
in turn spurred the independent development of Grid computing as a named entity,
although Grid-style clustering had been around at least as long as the Unix operating
system and the Arpanet, whether or not it, or the clusters that used it, were named.
Cluster technologies
MPI is a widely-available communications library that enables parallel programs to be
written in C, Fortran, Python, OCaml, and many other programming languages.
The GNU/Linux world sports various cluster software, such as:
1. Beowulf, distcc, MPICH and other - mostly specialized application clustering.
distcc provides parallel compilation when using GCC.
2. Linux Virtual Server, Linux-HA - director-based clusters that allow incoming
requests for services to be distributed across multiple cluster nodes.
3. MOSIX, openMosix, Kerrighed, OpenSSI - full-blown clusters integrated into the
kernel that provide for automatic process migration among homogeneous nodes.
OpenSSI, openMosix and Kerrighed are single-system image implementations.
Most of the clusters listed in TOP500 are linux clusters.
Microsoft Windows Compute Cluster Server 2003 based on Windows Server platform
provides pieces for High Performance Computing like the Job Scheduler, MSMPI library
and management tools.
NCSA's recently installed Lincoln is a cluster of 450 Dell PowerEdge™ 1855 blade
servers running Windows Compute Cluster Server 2003. This cluster debuted at #130 on
the Top500 list in June 2006.
6
DragonFly BSD, a recent fork of FreeBSD 4.8 is being redesigned at its core to enable
native clustering capabilities. It also aims to achieve single-system image capabilities.
Clustering software (open source)
1. BOINC - Berkeley Open Infrastructure for Network Computing
2. Gluster - The GNU Clustering Platform
3. Kerrighed
4. Linux-Cluster Project Global File System & HA
5. Linux Virtual Server
6. Linux-HA
7. Maui Cluster Scheduler
8. OpenSSI High-availability, load-balancing, and high-performance clustering with
or without a SAN.
9. OpenMosix
10. OpenSCE
11. Open Source Cluster Application Resources (OSCAR)
12. Rocks Cluster Distribution
13. Sun Grid Engine
14. TORQUE Resource Manager,
15. WareWulf
Clustering products
1. Alchemi
2. Condor
3. HP Serviceguard
4. HP's OpenVMS
5. IBM's HACMP
6. IBM Parallel Sysplex
7. KeyCluster
8. United Devices Grid MP
7
9. MC Service Guard for HP-UX systems
10. Microsoft Cluster Server (MSCS)
11. Platform LSF
12. NEC ExpressCluster
13. Oracle Real Application Cluster (RAC)
14. OpenPBS
15. PBSPro
16. PolyServe
17. Red Hat Cluster Suite,
18. SteelEye LifeKeeper
19. Sun Cluster
20. Sun N1 GridEngine Sun N1 GridEngine
21. Veritas Cluster Server (VCS), from VERITAS Software
22. Scyld Beowulf Cluster
23. Platform Rocks
24. Xgrid from Apple
Two-node cluster
A two-node cluster is the minimal High-availability cluster that can be built. Should one
node fail (for a hardware or software problem), the other must acquire the resources
being previously managed by the failed node, in order to re-enable access to these
resources: this process is known as failover.
Introduction and some definitions
There are various kinds of resources, notably:
1. Storage space (containing data, binaries, or everything else that needs to be
accessed)
2. IP address(es) (the users can reach the resources via TCP/IP connection)
3. Application software (that acts as an interface through the users and the data)
8
Typical services provided by a Computer cluster are built by a combination of each of
the previously defined resources.
So, you can have an Oracle database service, composed by:
1. Some storage space, to hold the database files (and, ultimately, the data)
2. An Oracle installation, configured to be remotely (or locally) accessed
3. An IP address to listen on... the users must connect to this address in order to use
Oracle to access the data
Hardware components
Required
1. Two hosts, each with its own local storage device(s)
2. Shared storage, that can be accessed by each host (or node), such as a file server
3. Some method of interconnection (that enables one node to see if the other is dead,
and to help coordinate resource access)
Interconnection topologies
1. A serial crossover cable is the simpler (and more reliable) way to ensure proper
intracluster communication
2. An ethernet crossover cable needs the hosts' TCP/IP stack to be functional to
ensure proper intracluster communication
3. A shared disk (in advanced setups), usually used for hearbeat only
9
Classification: by Service Aggregation Level
Two kinds of Computer cluster
Service based
When every service is independent from each other, provided by the cluster: say, you
can run a web server and a mail server on the cluster, and each one can be independently
managed, switched from one node to the other, without affecting the functionality of
other services.
One Open Source example of this kind of cluster is Kymberlite.
Logical-host based
In a more complex world, you can end up with some dependencies from one service to
another!
Say, you run a mail server that receives e-mail for the local users, thus storing them on
his storage resource, but, how can the users read the e-mails from a remote side?
So, you must implement some kind of mail retrieving server, like an IMAP server.
Both of these services need access to the same storage resource, the first for writing the
e-mail messages that arrived from the Internet, the second to read them/move them/delete
them.
So, you cannot simply failover the mail server from one node to the other, because the
mail retrieving server needs the data provided by the first service!!!
These two services has to be grouped together, forming a so-called logical host... to be
more precise, this logical host will be built by 3 resources:
1. the storage resource, needed by both server applications
2. the mail transfer service that receives e-mail from the Internet
10
3. the mail retrieval service that acts as an interface, permitting to the user to view
his e-mail
So, should you fail over this logical host, you must:
1. stop the mail retrieving service on the failed node (if possible)
2. stop the mail transfer service on the failed node (if possible)
3. release the storage resource on the failed node (if possible)
4. acquire the storage resource on the failover node
5. start the mail transfer service on the failover node
6. start the mail retrieving service on the failover node
One Open Source example of this kind of cluster is Linux-HA, one commercial example
(limited to Sun Microsystems Solaris machines) is called SunCluster
The Linux-HA (High-Availability Linux) project provides a high-availability (clustering)
solution for Linux, FreeBSD, OpenBSD, Solaris and OS X which promotes reliability,
availability, and serviceability (RAS).
The project's main software product is Heartbeat, a GPL-licensed portable cluster
management program for high-availability clustering. Its most important features are:
1. no fixed maximum number of nodes - Heartbeat can be used to build large
clusters as well as very simple ones
2. resource monitoring: resources can be automatically restarted or moved to another
node on failure
3. fencing mechanism to remove failed nodes from the cluster
4. sophisticated policy-based resource management, resource inter-dependencies and
constraints
5. time-based rules allow for different policies depending on time
6. several resource scripts (for Apache, DB2, Oracle, PostgreSQL etc.) included
7. GUI for configuring, controlling and monitoring resources and nodes
11
Conclusion
A computer cluster is a group of loosely coupled computers that work together closely so
that in many respects they can be viewed as though they are a single computer. The
components of a cluster are commonly, but not always, connected to each other through
fast local area networks. Clusters are usually deployed to improve performance and/or
reliability over that provided by a single computer, while typically being much more cost-
effective than single computers of comparable speed or reliability.
12

Más contenido relacionado

La actualidad más candente

BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONS
BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONSBUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONS
BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONSijccsa
 
High Performance Computing in the Cloud?
High Performance Computing in the Cloud?High Performance Computing in the Cloud?
High Performance Computing in the Cloud?Ian Lumb
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Robert Grossman
 
High performance computing
High performance computingHigh performance computing
High performance computingGuy Tel-Zur
 
Evolution of network - computer networks
Evolution of network - computer networksEvolution of network - computer networks
Evolution of network - computer networksSabarishSanjeevi
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Robert Grossman
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task ComputingEric Van Hensbergen
 
Tesla personal super computer
Tesla personal super computerTesla personal super computer
Tesla personal super computerPriya Manik
 
Multicore series-1-0223
Multicore series-1-0223Multicore series-1-0223
Multicore series-1-0223Aysha Khan
 
Introduction to HPC
Introduction to HPCIntroduction to HPC
Introduction to HPCChris Dwan
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Robert Grossman
 
Clustring computing
Clustring computingClustring computing
Clustring computingmeraz_ahmed
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefRobert Grossman
 

La actualidad más candente (19)

BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONS
BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONSBUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONS
BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONS
 
High Performance Computing in the Cloud?
High Performance Computing in the Cloud?High Performance Computing in the Cloud?
High Performance Computing in the Cloud?
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
Clusetrreport
ClusetrreportClusetrreport
Clusetrreport
 
Evolution of network - computer networks
Evolution of network - computer networksEvolution of network - computer networks
Evolution of network - computer networks
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)
 
High–Performance Computing
High–Performance ComputingHigh–Performance Computing
High–Performance Computing
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task Computing
 
Tesla personal super computer
Tesla personal super computerTesla personal super computer
Tesla personal super computer
 
Hadoop
HadoopHadoop
Hadoop
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
Multicore series-1-0223
Multicore series-1-0223Multicore series-1-0223
Multicore series-1-0223
 
Introduction to HPC
Introduction to HPCIntroduction to HPC
Introduction to HPC
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)
 
Clustring computing
Clustring computingClustring computing
Clustring computing
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster Relief
 
Swiss National Supercomputing Center
Swiss National Supercomputing CenterSwiss National Supercomputing Center
Swiss National Supercomputing Center
 

Destacado

Nano hub u-nanoscaletransistors
Nano hub u-nanoscaletransistorsNano hub u-nanoscaletransistors
Nano hub u-nanoscaletransistorsChris O'Neal
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computersshopnil786
 
Cluster computing pptl (2)
Cluster computing pptl (2)Cluster computing pptl (2)
Cluster computing pptl (2)Rohit Jain
 
Grid computing Seminar PPT
Grid computing Seminar PPTGrid computing Seminar PPT
Grid computing Seminar PPTUpender Upr
 

Destacado (7)

Nano hub u-nanoscaletransistors
Nano hub u-nanoscaletransistorsNano hub u-nanoscaletransistors
Nano hub u-nanoscaletransistors
 
Lecture 4 Cluster Computing
Lecture 4 Cluster ComputingLecture 4 Cluster Computing
Lecture 4 Cluster Computing
 
CLUSTER COMPUTING
CLUSTER COMPUTINGCLUSTER COMPUTING
CLUSTER COMPUTING
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computers
 
Cluster computing pptl (2)
Cluster computing pptl (2)Cluster computing pptl (2)
Cluster computing pptl (2)
 
Grid computing Seminar PPT
Grid computing Seminar PPTGrid computing Seminar PPT
Grid computing Seminar PPT
 

Similar a Computer cluster

Cluster Computing
Cluster ComputingCluster Computing
Cluster ComputingNIKHIL NAIR
 
Clustering by AKASHMSHAH
Clustering by AKASHMSHAHClustering by AKASHMSHAH
Clustering by AKASHMSHAHAkash M Shah
 
Super-Computer Architecture
Super-Computer Architecture Super-Computer Architecture
Super-Computer Architecture Vivek Garg
 
ITC4344_3_Cloud Computing Technologies.pptx
ITC4344_3_Cloud Computing Technologies.pptxITC4344_3_Cloud Computing Technologies.pptx
ITC4344_3_Cloud Computing Technologies.pptxZaharaddeenAbubuakar
 
UNIT I -Cloud Computing (1).pdf
UNIT I -Cloud Computing (1).pdfUNIT I -Cloud Computing (1).pdf
UNIT I -Cloud Computing (1).pdflauroeuginbritto
 
Performance improvement by
Performance improvement byPerformance improvement by
Performance improvement byIJCNCJournal
 
At the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackAt the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackRyan Aydelott
 
Grid computing assiment
Grid computing assimentGrid computing assiment
Grid computing assimentHuma Tariq
 
Plan9: Bad Movie, Good Operating System
Plan9: Bad Movie, Good Operating SystemPlan9: Bad Movie, Good Operating System
Plan9: Bad Movie, Good Operating SystemQuentin Fennessy
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceinventy
 
Vector processor : Notes
Vector processor : NotesVector processor : Notes
Vector processor : NotesSubhajit Sahu
 

Similar a Computer cluster (20)

Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Cluster computer
Cluster  computerCluster  computer
Cluster computer
 
Clustering by AKASHMSHAH
Clustering by AKASHMSHAHClustering by AKASHMSHAH
Clustering by AKASHMSHAH
 
Super-Computer Architecture
Super-Computer Architecture Super-Computer Architecture
Super-Computer Architecture
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
ITC4344_3_Cloud Computing Technologies.pptx
ITC4344_3_Cloud Computing Technologies.pptxITC4344_3_Cloud Computing Technologies.pptx
ITC4344_3_Cloud Computing Technologies.pptx
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
UNIT I -Cloud Computing (1).pdf
UNIT I -Cloud Computing (1).pdfUNIT I -Cloud Computing (1).pdf
UNIT I -Cloud Computing (1).pdf
 
Cluster computing2
Cluster computing2Cluster computing2
Cluster computing2
 
7- Grid Computing.Pdf
7- Grid Computing.Pdf7- Grid Computing.Pdf
7- Grid Computing.Pdf
 
Performance improvement by
Performance improvement byPerformance improvement by
Performance improvement by
 
Seminar
SeminarSeminar
Seminar
 
Cluster cmputing
Cluster cmputingCluster cmputing
Cluster cmputing
 
At the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackAt the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with Openstack
 
Grid computing assiment
Grid computing assimentGrid computing assiment
Grid computing assiment
 
Plan9: Bad Movie, Good Operating System
Plan9: Bad Movie, Good Operating SystemPlan9: Bad Movie, Good Operating System
Plan9: Bad Movie, Good Operating System
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Vector processor : Notes
Vector processor : NotesVector processor : Notes
Vector processor : Notes
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 

Más de Shiva Krishna Chandra Shekar (20)

Airtel final
Airtel finalAirtel final
Airtel final
 
Airtel COMPNAY
Airtel COMPNAYAirtel COMPNAY
Airtel COMPNAY
 
Microsoft data access components
Microsoft data access componentsMicrosoft data access components
Microsoft data access components
 
Ad hoc
Ad hocAd hoc
Ad hoc
 
Mobile adhoc
Mobile adhocMobile adhoc
Mobile adhoc
 
Ldap
LdapLdap
Ldap
 
L2tp1
L2tp1L2tp1
L2tp1
 
Ivrs
IvrsIvrs
Ivrs
 
Ip sec
Ip secIp sec
Ip sec
 
I pod
I podI pod
I pod
 
Internet
InternetInternet
Internet
 
Image compression
Image compressionImage compression
Image compression
 
Hyper thread technology
Hyper thread technologyHyper thread technology
Hyper thread technology
 
Raju html
Raju htmlRaju html
Raju html
 
Raju
RajuRaju
Raju
 
Dba
DbaDba
Dba
 
Di splay systems
Di splay systemsDi splay systems
Di splay systems
 
Ananth3
Ananth3Ananth3
Ananth3
 
Ppt
PptPpt
Ppt
 
Honeypots
HoneypotsHoneypots
Honeypots
 

Último

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Último (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Computer cluster

  • 1. COMPUTER CLUSTER Introduction A computer cluster is a group of loosely coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or reliability over that provided by a single computer, while typically being much more cost- effective than single computers of comparable speed or reliability. Cluster categorizations High-availability (HA) clusters High-availability clusters are implemented primarily for the purpose of improving the availability of services which the cluster provides. They operate by having redundant nodes, which are then used to provide service when system components fail. The most common size for an HA cluster is two nodes, which is the minimum requirement to provide redundancy. HA cluster implementations attempt to manage the redundancy inherent in a cluster to eliminate single points of failure. There are many commercial implementations of High-Availability clusters for many operating systems. The Linux- HA project is one commonly used free software HA package for the Linux OS. Load-balancing clusters Load-balancing clusters operate by having all workload come through one or more load- balancing front ends, which then distribute it to a collection of back end servers. Although they are primarily implemented for improved performance, they commonly include high-availability features as well. Such a cluster of computers is sometimes referred to as a server farm. There are many commercial load balancers available including Platform LSF HPC, Sun Grid Engine, Moab Cluster Suite and Maui Cluster 1
  • 2. Scheduler. The Linux Virtual Server project provides one commonly used free software package for the Linux OS. High-performance (HPC) clusters High-performance clusters are implemented primarily to provide increased performance by splitting a computational task across many different nodes in the cluster, and are most commonly used in scientific computing. One of the most popular HPC implementations is a cluster with nodes running Linux as the OS and free software to implement the parallelism. This configuration is often referred to as a Beowulf cluster. Such clusters commonly run custom programs which have been designed to exploit the parallelism available on HPC clusters. Many such programs use libraries such as MPI which are specially designed for writing scientific applications for HPC computers. HPC clusters are optimized for workloads which require jobs or processes happening on the separate cluster computer nodes to communicate actively during the computation. These include computations where intermediate results from one node's calculations will affect future calculations on other nodes. Grid computing Grid computing or grid clusters are a technology closely related to cluster computing. The key differences between grids and traditional clusters are that grids connect collections of computers which do not fully trust each other, and hence operate more like a computing utility than like a single computer. In addition, grids typically support more heterogeneous collections than are commonly supported in clusters. Grid computing is optimized for workloads which consist of many independent jobs or packets of work, which do not have to share data between the jobs during the computation process. Grids serve to manage the allocation of jobs to computers which will perform the work independently of the rest of the grid cluster. Resources such as storage may be shared by all the nodes, but intermediate results of one job do not affect other jobs in progress on other nodes of the grid. 2
  • 3. High-performance cluster implementations The TOP500 organization's semiannual list of the 500 fastest computers usually includes many clusters. TOP500 is a collaboration between the University of Mannheim, the University of Tennessee, and the National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory. As of August 2006, the top supercomputer is the Department of Energy's BlueGene/L system with performance of 280.6 TFlops. The second place is owned by another BlueGene/L system with performance of 91.29 TFlops. Clustering can provide significant performance benefits versus price. The System X supercomputer at Virginia Tech, the 28th most powerful supercomputer on Earth as of June 2006, is a 12.25 TFlops computer cluster of 1100 Apple XServe G5 2.3 GHz dual- processor machines (4 GB RAM, 80 GB SATA HD) running Mac OS X. The cluster initially consisted of Power Mac G5s; the rack-mountable XServes are denser than desktop Macs, reducing the aggregate size of the cluster. The total cost of the previous Power Mac system was $5.2 million, a tenth of the cost of slower mainframe supercomputers. (The Power Mac G5s were sold off.) The central concept of a Beowulf cluster is the use of commercial off-the-shelf computers to produce a cost-effective alternative to a traditional supercomputer. One project that took this to an extreme was the Stone Soupercomputer. John Koza has the largest computer cluster owned by an individual. The SETI@home project may be the largest distributed cluster in existence. It uses approximately three million home computers all over the world to analyze data from the Arecibo Observatory radiotelescope, searching for evidence of extraterrestrial intelligence. 3
  • 4. Cluster history The history of cluster computing is best captured by a footnote in Greg Pfister's In Search of Clusters: "Virtually every press release from DEC mentioning clusters says 'DEC, who invented clusters...'. IBM didn't invent them either. Customers invented clusters, as soon as they couldn't fit all their work on one computer, or needed a backup. The date of the first is unknown, but I'd be surprised if it wasn't in the 1960's, or even late 1950's." The formal engineering basis of cluster computing as a means of doing parallel work of any sort was arguably invented by Gene Amdahl of IBM, who in 1967 published what has come to be regarded as the seminal paper on parallel processing: Amdahl's Law. Amdahl's Law describes mathematically the speedup one can expect from parallelizing any given otherwise serially performed task on a parallel architecture. This article defined the engineering basis for both multiprocessor computing and cluster computing, where the primary differentiator is whether or not the interprocessor communications are supported "inside" the computer (on for example a customized internal communications bus or network) or "outside" the computer on a commodity network. Consequently the history of early computer clusters is more or less directly tied into the history of early networks, as one of the primary motivation for the development of a network was to link computing resources, creating a de facto computer cluster. Packet switching networks were conceptually invented by the RAND corporation in 1962. Using the concept of a packet switched network, the ARPANET project succeeded in creating in 1969 what was arguably the world's first commodity-network based computer cluster by linking four different computer centers (each of which was something of a "cluster" in its own right, but probably not a commodity cluster). The ARPANET project grew into the Internet -- which can be thought of as "the mother of all computer clusters" (as the union of nearly all of the compute resources, including clusters, that happen to be connected). It also established the paradigm in use by all computer clusters in the world 4
  • 5. today -- the use of packet-switched networks to perform interprocessor communications between processor (sets) located in otherwise disconnected frames. The development of customer-built and research clusters proceded hand in hand with that of both networks and the Unix operating system from the early 1970s, as both TCP/IP and the Xerox PARC project created and formalized protocols for network-based communications. The Hydra operating system was built for a cluster of DEC PDP-11 minicomputers called C.mmp at C-MU in 1971. However, it wasn't until circa 1983 that the protocols and tools for easily doing remote job distribution and file sharing were defined (largely within the context of BSD Unix, as implemented by Sun Microsystems) and hence became generally available in commercially, along with a shared filesystem. The first commercial clustering product was ARCnet, developed by Datapoint in 1977. ARCnet wasn't a commercial success and clustering per se didn't really take off until DEC released their VAXcluster product in the 1984 for the VAX/VMS operating system. The ARCnet and VAXcluster products not only supported parallel computing, but also shared file systems and peripheral devices. They were supposed to give you the advantage of parallel processing, while maintaining data reliability and uniqueness. VAXcluster, now VMScluster, is still available on OpenVMS systems from HP running on Alpha and Itanium systems. Two other noteworthy early commercial clusters were the Tandem Himalaya (a circa 1994 high-availability product) and the IBM S/390 Parallel Sysplex (also circa 1994, primarily for business use). No history of commodity compute clusters would be complete without noting the pivotal role played by the development of Parallel Virtual Machine (PVM) software in 1989. This open source software based on TCP/IP communications enabled the instant creation of a virtual supercomputer -- a high performance compute cluster -- made out of any TCP/IP connected systems. Free form heterogeneous clusters built on top of this model rapidly achieved total throughput in FLOPS that greatly exceeded that available even with the most expensive "big iron" supercomputers. PVM and the advent of inexpensive 5
  • 6. networked PC's led, in1993, to a NASA project to build supercomputers out of commodity clusters. In 1995 the invention of the "beowulf"-style cluster -- a compute cluster built on top of a commodity network for the specific purpose of "being a supercomputer" capable of performing tightly coupled parallel HPC computations. This in turn spurred the independent development of Grid computing as a named entity, although Grid-style clustering had been around at least as long as the Unix operating system and the Arpanet, whether or not it, or the clusters that used it, were named. Cluster technologies MPI is a widely-available communications library that enables parallel programs to be written in C, Fortran, Python, OCaml, and many other programming languages. The GNU/Linux world sports various cluster software, such as: 1. Beowulf, distcc, MPICH and other - mostly specialized application clustering. distcc provides parallel compilation when using GCC. 2. Linux Virtual Server, Linux-HA - director-based clusters that allow incoming requests for services to be distributed across multiple cluster nodes. 3. MOSIX, openMosix, Kerrighed, OpenSSI - full-blown clusters integrated into the kernel that provide for automatic process migration among homogeneous nodes. OpenSSI, openMosix and Kerrighed are single-system image implementations. Most of the clusters listed in TOP500 are linux clusters. Microsoft Windows Compute Cluster Server 2003 based on Windows Server platform provides pieces for High Performance Computing like the Job Scheduler, MSMPI library and management tools. NCSA's recently installed Lincoln is a cluster of 450 Dell PowerEdge™ 1855 blade servers running Windows Compute Cluster Server 2003. This cluster debuted at #130 on the Top500 list in June 2006. 6
  • 7. DragonFly BSD, a recent fork of FreeBSD 4.8 is being redesigned at its core to enable native clustering capabilities. It also aims to achieve single-system image capabilities. Clustering software (open source) 1. BOINC - Berkeley Open Infrastructure for Network Computing 2. Gluster - The GNU Clustering Platform 3. Kerrighed 4. Linux-Cluster Project Global File System & HA 5. Linux Virtual Server 6. Linux-HA 7. Maui Cluster Scheduler 8. OpenSSI High-availability, load-balancing, and high-performance clustering with or without a SAN. 9. OpenMosix 10. OpenSCE 11. Open Source Cluster Application Resources (OSCAR) 12. Rocks Cluster Distribution 13. Sun Grid Engine 14. TORQUE Resource Manager, 15. WareWulf Clustering products 1. Alchemi 2. Condor 3. HP Serviceguard 4. HP's OpenVMS 5. IBM's HACMP 6. IBM Parallel Sysplex 7. KeyCluster 8. United Devices Grid MP 7
  • 8. 9. MC Service Guard for HP-UX systems 10. Microsoft Cluster Server (MSCS) 11. Platform LSF 12. NEC ExpressCluster 13. Oracle Real Application Cluster (RAC) 14. OpenPBS 15. PBSPro 16. PolyServe 17. Red Hat Cluster Suite, 18. SteelEye LifeKeeper 19. Sun Cluster 20. Sun N1 GridEngine Sun N1 GridEngine 21. Veritas Cluster Server (VCS), from VERITAS Software 22. Scyld Beowulf Cluster 23. Platform Rocks 24. Xgrid from Apple Two-node cluster A two-node cluster is the minimal High-availability cluster that can be built. Should one node fail (for a hardware or software problem), the other must acquire the resources being previously managed by the failed node, in order to re-enable access to these resources: this process is known as failover. Introduction and some definitions There are various kinds of resources, notably: 1. Storage space (containing data, binaries, or everything else that needs to be accessed) 2. IP address(es) (the users can reach the resources via TCP/IP connection) 3. Application software (that acts as an interface through the users and the data) 8
  • 9. Typical services provided by a Computer cluster are built by a combination of each of the previously defined resources. So, you can have an Oracle database service, composed by: 1. Some storage space, to hold the database files (and, ultimately, the data) 2. An Oracle installation, configured to be remotely (or locally) accessed 3. An IP address to listen on... the users must connect to this address in order to use Oracle to access the data Hardware components Required 1. Two hosts, each with its own local storage device(s) 2. Shared storage, that can be accessed by each host (or node), such as a file server 3. Some method of interconnection (that enables one node to see if the other is dead, and to help coordinate resource access) Interconnection topologies 1. A serial crossover cable is the simpler (and more reliable) way to ensure proper intracluster communication 2. An ethernet crossover cable needs the hosts' TCP/IP stack to be functional to ensure proper intracluster communication 3. A shared disk (in advanced setups), usually used for hearbeat only 9
  • 10. Classification: by Service Aggregation Level Two kinds of Computer cluster Service based When every service is independent from each other, provided by the cluster: say, you can run a web server and a mail server on the cluster, and each one can be independently managed, switched from one node to the other, without affecting the functionality of other services. One Open Source example of this kind of cluster is Kymberlite. Logical-host based In a more complex world, you can end up with some dependencies from one service to another! Say, you run a mail server that receives e-mail for the local users, thus storing them on his storage resource, but, how can the users read the e-mails from a remote side? So, you must implement some kind of mail retrieving server, like an IMAP server. Both of these services need access to the same storage resource, the first for writing the e-mail messages that arrived from the Internet, the second to read them/move them/delete them. So, you cannot simply failover the mail server from one node to the other, because the mail retrieving server needs the data provided by the first service!!! These two services has to be grouped together, forming a so-called logical host... to be more precise, this logical host will be built by 3 resources: 1. the storage resource, needed by both server applications 2. the mail transfer service that receives e-mail from the Internet 10
  • 11. 3. the mail retrieval service that acts as an interface, permitting to the user to view his e-mail So, should you fail over this logical host, you must: 1. stop the mail retrieving service on the failed node (if possible) 2. stop the mail transfer service on the failed node (if possible) 3. release the storage resource on the failed node (if possible) 4. acquire the storage resource on the failover node 5. start the mail transfer service on the failover node 6. start the mail retrieving service on the failover node One Open Source example of this kind of cluster is Linux-HA, one commercial example (limited to Sun Microsystems Solaris machines) is called SunCluster The Linux-HA (High-Availability Linux) project provides a high-availability (clustering) solution for Linux, FreeBSD, OpenBSD, Solaris and OS X which promotes reliability, availability, and serviceability (RAS). The project's main software product is Heartbeat, a GPL-licensed portable cluster management program for high-availability clustering. Its most important features are: 1. no fixed maximum number of nodes - Heartbeat can be used to build large clusters as well as very simple ones 2. resource monitoring: resources can be automatically restarted or moved to another node on failure 3. fencing mechanism to remove failed nodes from the cluster 4. sophisticated policy-based resource management, resource inter-dependencies and constraints 5. time-based rules allow for different policies depending on time 6. several resource scripts (for Apache, DB2, Oracle, PostgreSQL etc.) included 7. GUI for configuring, controlling and monitoring resources and nodes 11
  • 12. Conclusion A computer cluster is a group of loosely coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or reliability over that provided by a single computer, while typically being much more cost- effective than single computers of comparable speed or reliability. 12