SlideShare una empresa de Scribd logo
1 de 139
CHAPTER 1
INTRODUCTION TO SOFTWARE REJUVENATION IN
COMPLEX SYSTEM
1.1 Introduction
Industry uses high complex system environment, which tends to software aging. Availability is
the critical issue for system failure, which causes system degradation, to avoid this issue software
rejuvenation technique is used, we use optimal rejuvenation technique for dynamically solving
aging problem based on variable workload and timer policy, performance degradation,
crash/hang, failure may occur due to data corruption, numerical errors and maximum use of
system resource unnecessarily, this leads to software degradation which is known as software
aging [1].
If the load increases system may tends to crash that is software aging occurs. That is solved by
software rejuvenation technique; software rejuvenation [2] is proactive fault management
technique to clear system errors and prevent system from failures in future.
This project implements different software rejuvenation techniques depending on variable
workload and optimize rejuvenation time, Rejuvenation time is calculated depending on variable
workload which is given by system. System periodically checks the workload and update to
rejuvenation manager. Figure 1.1 shows the architecture of Rejuvenation manager, it is consists
of aging detector which detects the software aging point and optimizer to optimize the timer
value for a point of rejuvenation. Aging detector and optimizer has two components namely
variable workload and timer policy which perform their defined function respectively. Aging
detector obtains the value provided by rejuvenation manager periodically and checks for the need
for change in rejuvenation time depending on workload and this is updated for rejuvenation
manager, if there is need for change in rejuvenation time then rejuvenation manager allows
optimizer to change the time, the function of optimizer is to optimize the time depending the
values provided by aging detector based on workloads.
Figure 1.1 System architecture used for software rejuvenation in complex system
Analysis of performance, dependability of complex systems is done through SPNP (Stochastic
Petri Net Package) [2].Weight cardinality arc; guarded function of a complex system is
constructed through SPNP.
A state can be reached by all other states becomes irreducible in Marko chain [2]. A CTMC
(Continuous Time Markov Chain) [3] is ergodic if it is irreducible and if a state is reached by all
other state recursively in finite period. Steady state analysis underlying ctmc is done by SPNP
[3], and few measures related to steady state are not considered as their values can be obtained
by steady state probability.
Software rejuvenation process can be done in different methods namely cold and warm .In cold
OS reboot process, the system is rebooted immediately at rejuvenation point. Rejuvenation point
is a point where memory consumption of system reaches a threshold value or predetermined
time. When system consumes high amount of ram the OS must be rebooted, clearing all internal
states. Memory consumption may be done by applications or error prone codes which run for
long time consuming large amount of ram or OS itself.
In OS warm reboot process, before rebooting the kernel state is saved, including all applications
running on kernel, their sates are saved .saving the kernel state is done by creating a complete
image of kernel.
Approaches to Software Rejuvenation
Software rejuvenation can be divided broadly into twoapproaches as follows.
Time based approach [4][5][6]:
In this approach, rejuvenationis performed without any feedback from the system. Rejuvenation
in this case, can be based just onelapsed time (periodic rejuvenation) and/or instantaneous
cumulative number of jobs on the system.
Time and workload approach [4][5][6]:
In this approach, rejuvenation is performed based on information on thesystem “health”. The
system is monitored continuously (in practice, at small deterministic intervals) anddata is
collected on the operating system resource usageand system activity. This data is then analyzedto
estimate time to exhaustion of a resource whichmay lead to a component or an entire system
degradationcrash. This estimation can be based purely on time or can be based on both time and
systemworkload. Time is optimized based on workload applied and it is updated to system
rejuvenation time.
1.2 Motivation:
• Current systems are reactive if a failure occurs necessary steps will be taken to handle it
but they can’t detect such a failure beforehand. Our project aims to detect such failure
proactively using Time and workload techniques and take action before a given node
crashes.
• Our project determines if a node is going to fail based on RAM utilization and then it
rejuvenates the failing node. Our project analyze and identifies these failing nodes using
both Time and load balancing Rejuvenation techniques.
1.3 State of Art Development
Complex system is a form of ubiquitous computing deals with providing everything as a service.
Complex system mainly used in business and IT industry it offers heavy outsourcing model
computational resource, where service availability, security and quality are essential features. In
Complex system High service availability is the most important requirement increasingly being
demanded in commercial computer, and communication systems.
In recent years many research efforts have been going to find the optimal infrastructure size and
configuration that guarantee the desired availability level. Software fault tolerance is often found
to be the bottleneck. A failure in software’s is mainly due to certain elusive error conditions
which it leads to resource exhaustion. Software systems appear to age as error conditions arise
and accumulate with operational time due to certain elusive faults in system software and
application software. Software rejuvenation is a proactive fault management technique aimed at
cleaning up the internal states in order to prevent the occurrence of severe crash failures in the
upcoming years the simplest way to emulate software rejuvenation is to reboot the system or
restart the aging application. It is a cost effective technique dealing with software faults that
includes protection not only against hard failures and also due to degradation over time of
application performance.
1.3.1 Classification of software faults:
Faults, in both hardware and software, can be classified according to their phase of creation or
occurrence, system boundaries (internal or external), domain (hardware or software). In this
section, we limit ourselves to the classification of software faults based on their phase of creation
.some studies have suggested that since software is not a physical entity, it is not focusing to
transient physical phenomena (as opposed to hardware), hence software faults are stable in
nature [1].some other studies organizes software faults as both permanent and transient.
Gray [2] categorizes software faults into Bohrbugs and Heisenbugs. Bohrbugs is essentially
stable design faults and hence, approximately it is deterministic in nature. They can be
recognized easily and weeded out during the testing and debugging phase (or early deployment
phase) of the software life cycle. A software system with Bohrbugs is related to a faulty
deterministic finite state machine. Heisenbugs, on the other hand, fit into the class of temporary
internal faults and are intermittent.
They are essentially stable faults whose conditions of creation occur rarely or are not easily
recreated. Hence these types of faults result in transient failures i.e., failures which may not
occur again if the software is restarted. Heisenbugs are extremely difficult to identify through
testing. Hence a piece of software which is developed in the operational phase gets released after
its development and testing phase, is more likely to be experienced with failures caused by
Heisenbugs than due to Bohrbugs.
Most modern studies on failure data have reported that a large percentage of software failures
are transient in nature caused by phenomena such as overloads or timing and exception errors.
The revise of failure data from Tandem’s fault tolerant computer system indicate that 70% of
the failures were transient failures, caused by faults similar to race conditions and timing
problems. We designate faults attributed to software aging as aging related faults. Aging related
faults fall under Bohrbugs or Heisenbugs depending on whether the failure is deterministic
(repeatable) or transient [3]. Foraging-related bugs, environment diversity can be particularly
effective if utilized proactively in the form of software rejuvenation. Rejuvenation operation
can be triggered either by time based (on deterministic intervals) or by using measurement and
analysis of data of the system condition that undergoes software aging problems in various
workstation environments.
1.3.2. Basic concepts of software aging and software rejuvenation:
Software aging is defined as the state of the software that degrades with time. The primary
causes of this degradation are the exhaustion of operating system resources, data corruption,
and accumulation of numerical errors, which eventually may lead to performance degradation
of the software, crash/hang failure, or both.
A typical example of software aging is progressive increase in memory consumption which
conclusively causes a memory leak. Since software aging can be observed only in the software
execution, it is difficult to find aging related problems until the software is deployed and
executed in a specific environment. This figure describes the threads which lead to aging
related failure in the system.
Figure 1.2 Aging Related failure
The accumulation of AR (aging related) errors may tend to AR failure or fault. Aging effects
can also be classified into volatile and non-volatile effects. They are considered volatile if they
are isolated by re-initialization of the system or process affected, for example via a system
reboot. In contrast, non-volatile aging effects still exist after reinitializing of the
system/process. Physical memory division and OS resource outflow are examples for volatile
aging effects. File system schema and database metadata fragmentation are examples for non-
volatile aging effects [4]. The fault tolerance technique which is used to mitigate the aging
effects of system is known as software rejuvenation.
Software rejuvenation is defined as occasionally stopping the running software, cleaning its
internal state or its environment and restarting it. Such a technique known as software
rejuvenation was proposed by Huang which counteract the aging phenomenon in a proactive‖
manner by removing the accumulated error conditions and freeing up of operating system
resources. Garbage collection, flushing operating system kernel tables, and reinitializing internal
data structures are some examples by which the internal state or the environment of the software
can be cleaned. There are basically two approaches followed for Software rejuvenation and for
finding the optimum rejuvenation schedule: first is by analytic model and measurement based
rejuvenation. The analytic modelling approach assumes failure and repair time distributions of a
system and obtains optimal rejuvenation Schedule to maximize the availability, or minimize the
loss probability or downtime of cost. Measurement-based rejuvenation approach is based on
monitoring of resource consumption in a computer system and analysis of that data to determine
the point of time when a resource will be completely exhausted, thereby causing the system to
hang/crash. Measurement based Software
AR
Bugs
Aging
Factors
AR
Error
System
Internal
Environment
AR
Failure
Activates Propagates
Rejuvenation can follow any of the following policies: Purely Time based Software
Rejuvenation Policy (PTSRP) or Purely Prediction based Software.
Figure 1.3 Rejuvenation Scheduling
1.3.3 Rejuvenation technique
In this section, we review the three VMM rejuvenation techniques. When VMM rejuvenation
needs to be performed on a host, the hosted VMs also need to be controlled because the
execution environments of VMs are cleared by the VMM rejuvenation. Prior to VMM
rejuvenation, we can perform VM shutdown (i.e., Cold-VM rejuvenation), VM suspend (i.e.,
Warm-VM rejuvenation), or VM migration (i.e., Migrate-VM rejuvenation). These approaches
are presented in the next three subsections.
1.3.3.1 Cold-VM rejuvenation
The easiest way to deal with the hosted VMs before triggering rejuvenation of VMM is to shut
down all the hosted VMs regardless of the execution states of the VMs. The VMs are then
Software rejuvenation Scheduling
Time-Based Inspection-Based
Threshold Based Prediction based
Mixed Approach
Statistical
Structural
Models and
Statistical
Machine
Learning
Online | Offline Online| Offline
Online| Offline
Online |Offline
Online |Offline
Online |Offline
restarted in clean states after the VMM rejuvenation. This approach is called Cold-VM
rejuvenation. All the transactions running on VMs are vanished by the Cold-VM rejuvenation
[6]. An advantage of the Cold-VM rejuvenation, however, is that the rejuvenation action cleans
all the aging states of the VMs in addition to the aging states of the VMM
1.3.3.2 Warm-VM rejuvenation
Instead of shutting down the hosted VMs, the hosted VMs are suspended prior to VMM
rejuvenation is triggered and the executions of the VMs are resumed at the completion of the
VMM rejuvenation. We call this technique Warm-VM rejuvenation [5]. Since the execution
states of the hosted VMs are saved prior to VMM rejuvenation, the transactions running on the
VMs are not lost due to the VMM rejuvenation. However, Warm-VM rejuvenation retains the
aging states of VMs by VM suspend. The aging states in the hosted VMs are not cleared by
VMM rejuvenation and hence we need to rely on rejuvenation for VM to clear the aging states of
VMs.
1.3.3.3 Migrate-VM rejuvenation
Live VM migration is a technique to move a running VM to another host incur a short service
interruption and is supported in most modern VMM implementations such as Xen and VMware.
Although a shared storage system is required to store a VM image, the downtime overhead
caused by a VM migration is less. Using live VM migration, hosted VMs are moved to another
host prior to VMM rejuvenation and returned back to the original hosting server after the
completion of the rejuvenation of the VMM, by a reverse live VM migration. We call this
combined method as Migrate-VM rejuvenation [6]. The VM continues the execution even
while the VMM on the original host is being rejuvenated. However, the aging states in the hosted
VMs are not cleared by the VMM rejuvenation as in the case of Warm-VM rejuvenation. Live
VM migration works only when the migration target server is running and it has a capacity to
accept the migrated VM. Comparison of different software rejuvenation policy is described in
table 1.1
Table 1.1 Comparison of different software rejuvenation policy
Policy Aging Condition Analysis
Tool
Threshold
value
Availability Methodology
Or
Model
Constrained
Element
Based
Software
Rejuvenation
Policy In
Embedded
Environment
(CESRP)
To detect aging
CESRP uses
CPU frequency
- - -- - - - -- -
W (Shapiro-walk
Detection) =
0.9781
Pvalue = 0.8453
Probability
density
µ=8450.16
Stack result
σ=1830.731
Constrained
path and
Constrained
element with
mathematical
model
Gray’s
classification
of software
faults
Hear aging is
detected by time
base depending
on result of an
operating system
resources
SNMP
(Simple
Network
Management
Protocol)
based on
distributed
resource
monitoring
tool
• Mean time
to recover
from a
failure = 4
hours
• Mean time
to
rejuvenate
the system
= 1 hour
• Mean time
to failure =
41.38 days
• Cost of
failure =
$5000/hour
• Cost of
rejuvenation
= $500/hour
σ∗ (Optimal
availability) =
36.12 days
σ (Down
time) = 5.60 days
Semi-markov
reward model
based on
workload and
resources
POMDP
(Partially
Observable
markov
decision
process)
Aging detected
based on the
degradation level
of system
- - -- - - - -- -
POMDP K = 1
0.9951
POMDP K = 4
0.9932
POMDP K = 9
0.9901
CMTC
(Continues
Time Markov
Chain) model
POMDP K = 99
0.9901
Software
rejuvenation
based on
automated
self-healing
techniques
Aging can be
detected based
on
1.Online
transaction
processing
(OLTP) servers,
2.Middleware
applications and
Web/application-
servers
SAN
(Stochastic
Active
Network)
- - -- -
- - -- -
Basic steady-
state availability
= 0.824673
Tolerance
availability =
0.983678
This policy
consists of
six
methodology
• System
under test
•Fault
model.
• Fault-
remediatio
n
relationshi
p.
• Micro-
measurem
ents.
• Macro-
measurem
ents.
• Workload
and metric
collectors.
Component-
Dependency
based Micro-
Rejuvenation
Scheduling
Policy
An aging can
detected based
on utilization of
system
resources, such
as memory,
SAN Model
- - -- - - - -- -
micro-
rejuvenation
scheduling
1.4 Problem Statement:
• Performance degradation in the complex system running for a long time
• They are susceptible to crash because of data corruption, numerical error accumulation
and availability of OS resources.
• Thus, leading to downtime and non-optimal performance.
• Based on vary in workload the rejuvenation time is optimized to reduce the down time
and increase the availability of the system.
1.5 Objectives of the project:
• The main objective of this project is to reduce software failure rates, avoid downtime and
to improve the system availability using Software rejuvenation policy based on time and
load balancing scheme using ITL algorithm.
• Availability of the system for various rejuvenation techniques is analyzed.
• Analysis of different rejuvenation technique is done, based on values obtained from
SPNP
1.6 Scope of the work:
1.6.1 Limitations of the project:
• Hardware compatibility is required.
• Same hardware configurations are required on end systems.
• Worked on open source tools and packages.
1.6.2 Constraints of the project:
• Rejuvenation time and memory peak value is set based on the machine learning studies.
• Hardware virtualization must be supported.
• Systems must support NFS( Network File Shared).
1.7 Methodology :
• Implementation of a proactive based appraoch for software rejuvenation using Time and load
balancing schema based techniques.
• SRN modelled graphs were used for analysis of algorithm on all modules.
• Physical Memory utilization is considered for implementing the Time and Workload based
approaches.
• Designed ITL algorithm makes use of timer and variable workload policy to present the time-
based rejuvenation for performing dynamic adaptation of the rejuvenation timer based on the
workload conditions.
• ITL algorithm is used to optimize rejuvenation time defined by user when workload is
variable.
• Availabilty of the system of different modules is derived based on different parameters
obtained from SPNP.
• Live migration of virtual machine is done using KVM/QEMU.
• NFS is configured on two servers to migrate the VM.
Figure 1.4 Optimization
of rejuvenation
time for variable
workload
Figure 1.4 describes
ITL algorithm to
optimize the rejuvenation
time (VT) with respect
to system workload
(WL). Based on variation
of workload WL+1 the
rejuvenation time has to be optimized to VT+1. If the system workload is back to the normal
condition WL+2, the optimizer has to optimize the rejuvenation time to VT+2.
1.8 Organization of the Report:
This report contains 8 chapters.
• The first chapter deals with the Introduction to the project. It covers the purpose,
motivation and scope for the project. It also talks about the methodology adopted and the
literature survey undertaken.
• The second chapter primarily deals theory and concepts of software rejuvenation in
complex system.
• The third chapter primarily deals with the software and hardware requirements
specifications required for the project. It is the software requirements document.
• The fourth chapter explains the design of the system proposed in the project. It gives a
detailed overview of the components used in the project.
• The fifth chapter describes the implementation of the project. It discusses the difficulties
encountered while coding the project and the coding standards used.
• The sixth chapter focuses on testing the application so that it is a robust product. Various
tests are conducted on the all modules.
• The seventh chapter presents the results of the classification and comparison. Analysis of
all modules is done here.
• The eighth chapter gives the conclusion of the project. It deals with the limitations of the
product and the future enhancements possible in the project.
CHAPTER 2
THEORY AND CONCEPTS OF SOFTWARE
REJUVENATION IN COMPLEX SYSTEM
2.1 Introduction
Software rejuvenation has become a new horizon for increasing the system reliability and
availability in a long run. With time, the system outages tend to increase due to the aging of
software which may be caused due to numerous factors like memory leaks, unreleased locks, file
descriptor leaking and so on. The rejuvenation of the software based on time factor tends to
periodically rollback a continuously running application to prevent failures in the future. The
time factor is set a particular value after which the software is restarted. Thus the better way to
avoid software failure and to increase the availability and reliability of the system is to find the
failure probable state and rejuvenate the software prior to the failed state. Project investigates
about time based rejuvenation policies in maintaining high reliability of software systems.
Software rejuvenation is a process or act of gracefully terminating a running application and
restarting it. The main motive behind the rejuvenation process is to prevent any unexpected
errors which might be caused due to aging related issues of the software. So the idea of the
software rejuvenation is to suspend the application and restart it before it suffers any error. The
rejuvenation strategy is primarily intended for servers where the applications are intended to run
incessantly for days without any failure. Software aging involves the gradual degradation of
application performance over time that may lead to untimely cessation of the program. The main
objective of the process is to maintain higher system reliability and availability by cleaning
internal system states prior to the failure state of the application.
2.2 Software Rejuvenation Techniques Review
Software rejuvenation technique takes in account different types of approaches. Broadly these
are classified as: Standard rejuvenation, Delayed rejuvenation and Mixed rejuvenation.
2.2.1 Standard Rejuvenation
In Standard rejuvenation, rejuvenation occurs once triggering interval is reached. This
rejuvenation policy does not take workload into consideration i.e., there is no concern of
workload. This strategy ignores both i.e. Peak load or off peak load and the rejuvenation
happens on triggered time.
2.2.2 Delayed Rejuvenation
In delayed rejuvenation, on peak load nodes are scheduled for rejuvenation if the rejuvenation
time is reached during peak period, the actual rejuvenation is started as soon as the next off peak
period starts.
2.2.3 Mixed Rejuvenation
The mixed rejuvenation policy is the combination of standard rejuvenation strategy and delayed
rejuvenation strategy. If the rejuvenation is timed early in peak period, rejuvenation of the
application is done immediately or else the rejuvenation is delayed till the next off period starts.
2.2.4 Erlang Approximation
Based on workload, i.e., peak load and off peak load, different time policy methods are
established to solve the quest for finding the interval need for scheduling. In standard
rejuvenation, neglecting peak load or off peak load rejuvenation occurs on triggered time. In
delayed rejuvenation as defined above the delayed time is obtained by Erlang distribution. DSPN
becomes a markovian stochastic Petri net and the solution techniques for markov chains can be
applied. The deterministic switching time between peak and off-peak periods is kept as it is,
hence this model is a DSPN (Deterministic and stochastic petri nets).
The rejuvenation is triggered at every time units, and is modeled by the deterministic transition,
with constant firing time. When deterministic transition is fired, and if the immediate transition is
enabled at that time, the token will be moved to another place, indicating a beginning of
rejuvenation activity. Standard rejuvenation, timer is always enabled, while for delayed
rejuvenation, timer is disabled during peak period, and for mixed rejuvenation, timer is enabled
for the initial time duration of certain length and disabled thereafter in peak period. After the
rejuvenation finishes, reset will fire to return a token back to its place, hence beginning the next
rejuvenation cycle. In order to make the model solvable by SPNP approximate the deterministic
transition by an r- stage Erlang distribution. This is achieved by storing r tokens in other place
and replacing deterministic timer by an exponentially distributed timed transition with firing rate
r/timer. At the same time, they change the multiplicities to r for the output arc of reset timer and
the input arc of timer policy.
2.3 Stochastic Reward Nets Model for Time based Software
Rejuvenation in Virtualized Environment
Here we are mainly focused on the unplanned software outages due to software aging problem.
We present a comprehensive availability model for both VM clustering software rejuvenation
model and VM migration based software rejuvenation model. In this model captures software
aging states of VM and VMM as well as their failures caused by aging. Using analytical
modeling as a stochastic reward nets (SRN).
In this model we describe our proposal to offer high availability mechanism using time based
software rejuvenation methodology. First we present the ways of using virtualization to improve
software rejuvenation for addressing the software aging issue. In the proposed system,
virtualization technology and software rejuvenation are used to provide the availability of the
services.
Clustering supports two or more servers running duplicate VMs. Failover technologies also
allow a failed VM to load from a storage snapshot and start up on another server. To counteract
the software and hardware failure, the rejuvenation schedules for VM and VMM need to
determine in proper way for the VM availability, since VMM rejuvenation effects VMs running
on the VMM. The following two scenarios are studied in this paper.
2.3.1 VM Clustering Software Rejuvenation (2vms1pm)
Physical machine hosts the virtual machines. One monitoring VM and other operational VMs on
the top of the virtualization layer (VMM) are created. The main application server will be
running on one VM and the remaining VM will be used for standby server. Some software
modules that will be responsible for the detection of software aging are installed in the
monitoring VM. The monitoring VM will trigger a rejuvenation operation. If the active VM is
about to be rejuvenated, standby VM will be started and then all the new requests and sessions
are switched from the active VM to standby VM. So the physical machine itself is a SPOF
(single point of failure).
2.3.2 VM Migration Based Software Rejuvenation (2vms2pms)
In this scenario Active-standby virtualized clustering architecture is employed. A high available
cluster is built between two or more virtual machines, each of them running on different physical
machines (2 PMs). Two PM’s consists of Active physical server and standby physical server.
Both physical servers can access shared storage. A heartbeat keep-alive system is used to
monitor the interaction of VMs and the physical servers. At active physical server, VMs are
created as monitoring VM, active VM and standby VM as well as standby physical server. Both
VM and VMM time-based rejuvenation mechanism is considered in this scenario. Time based
rejuvenation policy for VM is same as active-standby VMs hosted on 1PM.
Live VM migration enables a running VM on a host server to move onto the other host server
with very small interruption of the execution. When VMM need to be rejuvenation, the hosted
VMs can move onto other physical server. It can return back to the original host after the
completion of the VMM rejuvenation by live VM migration again. In the event of an active
physical server outage, the virtualized recovery server at standby physical server can be activated
to take over the running of the workload immediately using live migration. The down time of a
VM caused by live VM migration is very small and the VM continues the execution even while
the original host is down.
CHAPTER 3
SOFTWARE REQUIREMENT SPECIFICATION OF
SOFTWARE REJUVENATION IN COMPLEX SYSTEM
3.1 Project Description
Software Rejuvenation in Complex System has six different modules namely OS Cold reboot,
OS Warm reboot, VM Cold reboot, VM Warm reboot, VMM reboot, VM migration. Each
module consist of unique working method, which is explained below
3.2 Module Description
There are mainly six modules, they are: OS cold reboot, OS warm reboot, VM cold reboot, VM
warm reboot, VM migration, VMM reboot.
3.2.1 Module for OS Cold rejuvenation:
In Cold OS reboot process, the system is rebooted immediately at rejuvenation point.
Rejuvenation point is a point where memory consumption of system reaches a threshold value or
predetermined time. When system consumes high amount of ram the OS must be rebooted,
clearing all internal states. Memory consumption may be done by applications or error prone
codes which run for long time consuming large amount of RAM or OS itself
In this process the memory left is compared to our pre-determined threshold value, if the
memory left is greater than the threshold value, the system is allowed to run in normal state i.e.
Systems have not reached the threshold point of consumption. If it is lesser i.e. the system have
consumed memory greater than the threshold point, then OS is restarted immediately
The amount of free memory left is extracted and compared with predetermined threshold free
memory value, on results of comparison obtained; further process is taken care by ITL
algorithm.
3.2.2 Module for OS warm rejuvenation:
In OS warm reboot process, before rebooting the kernel state is saved, including all applications
running on kernel, their sates are saved .saving the kernel state is done by creating a complete
image of kernel.
OS reboot process is divided in two stages 1) Suspend, 2) Resume. In Suspend stage kernel is
called to create a snapshot of current system state later snapshot data is written to disk, finally
system is rebooted. In Resume stage, when the system is turned on, grub loader runs from initrd
before mounting any partitions, later all the data of snapshot is read from disk and loaded to
kernel, kernel restores the image and thus system runs from same state where it was suspended.
3.2.3 Module for VM cold reboot:
In VM cold reboot process [9], the VM is rebooted immediately at rejuvenation point, hypervisor
is untouched. Rejuvenation point is a point where memory consumption of system reaches a
threshold value or predetermined time. When VM consumes high amount of ram the VM must
be rebooted, clearing all internal states. Memory consumption may be done by applications or
error prone codes which run for long time consuming large amount of RAM or OS itself
ITL algorithm compares the memory left to our pre-determined threshold value, if the memory
left is greater than the threshold value; the system is allowed to run in normal state i.e. System
have not reached the threshold point of consumption. If it is lesser i.e. the system have consumed
memory greater than the threshold point, then rejuvenation time is optimized and updated to
predetermined time, when rejuvenation time is equal to system time then VM is restarted
immediately without saving any state of running VM.
3.2.4 Module for VM warm reboot:
In VM warm reboot process, before rebooting the kernel state of particular failing VM is saved,
including all applications running on kernel, their sates are saved .saving the kernel state is done
by creating a complete image of kernel.
VM Warm reboot process is divided in two stages 1) Suspend, 2) Resume. In Suspend stage
kernel is called to create a snapshot of current system state later snapshot data is written to disk,
finally system is rebooted. In Resume stage, when the system is turned on, grub loader runs from
initrd before mounting any partitions, later all the data of snapshot is read from disk and loaded
to kernel, kernel restores the image and thus system runs from same state where it was
suspended. Here this module provides decrease in request failures and high availability to the
VM.
ITL algorithm compares the memory left to our pre-determined threshold value, if the memory
left is greater than the threshold value; the system is allowed to run in normal state i.e. System
have not reached the threshold point of consumption. If it is lesser i.e. the system have consumed
memory greater than the threshold point, then rejuvenation time is optimized and updated to
predetermined time, when rejuvenation time is equal to system time then VM is restarted
immediately saving state of running VM.
3.2.5 Module for VMM reboot
In VMM cold reboot process, the VMM is rebooted immediately at rejuvenation point, all the
VM’s running on VMM are shut down before rebooting VMM. Rejuvenation point is a point
where memory consumption of system reaches a threshold value or predetermined time. When
VMM consumes high amount of RAM the VMM must be rebooted, clearing all internal states.
Memory consumption may be done by applications or error prone codes which run for long time
consuming large amount of RAM.
In this process ITL algorithm compares the memory left to our pre-determined threshold value, if
the memory left is greater than the threshold value, the system is allowed to run in normal state
i.e. System have not reached the threshold point of consumption. If it is lesser i.e. the system
have consumed memory greater than the threshold point, then rejuvenation time is optimized and
updated to predetermined time, when rejuvenation time is equal to system time then VM is
restarted immediately without saving any state of running VM, If VMM memory consumption
reaches its peak point i.e. VMM tending to crash in soon time then VMM is restarted even if all
VM is running in normal state and no state, data is saved but user is given period of one minute
user can cancel the rebooting process or shutdown the VMM completely.
3.2.6 Module for VM Migration [10] [11] [12]
In this module, VM from the failing server is transferred to preconfigured secondary server
before the VM tending to fail, the complete data and application running on the main server is
transferred to the secondary with no interruption for application running. When the complete VM
is transferred to another server and loaded, all the applications which were running in main
server will be in same state even after transferred, with no loss of data of applications running.
As this is all done by configuring NFS for both servers and configuring virtual manager and
virish packages initially, applying this concept to our project, when the server get huge load of
request or high memory is consumed which may lead to hang/crash or failure of the system,
when user set the rejuvenation time and threshold memory value, rejuvenation manager checks
for aging problem in system and if aging problem is detected then the rejuvenation time
predetermined by user is optimized by ITL algorithm and system is rejuvenated at rejuvenation
time, here for rejuvenation we use migration technique to migrate the VM running and reboot the
server, hence we provide high availability and decrease in request failure.
3.3 Software requirements:
Table 3.1 Software requirements
3.4 Hardware Requirements:
Table 3.2 Hardware Requirements
3.5 Performance Requirements:
• Availability
The system shall achieve 100 percent availability at all time.
• Portability
Minimum Requirements
OS Cent OS
Ubuntu OS
Other KVM/QEMU must be installed on both the servers.
NFS must be configured on both the system to migrate the VM
Note: KVM is a hypervisor or Virtual Machine Monitor, NFS (Network
File System) is distributed file system protocol.
Language C
Minimum Requirements
Processor Intel Pentium or better
Memory 4 GB RAM
Hard Disk 100 GB of hard disk space required.
Display 1024x 768 or higher-resolution display with 16 bits colors
The system should be implemented by the java so it can move easily from one system
another system because it is purely platform independent.
• Scalability
The system shall uses in multiple approaches.
• Maintainability
The sys00tem should be optimize for supportability, or ease of maintenance as for as
possible. This may be achieved through the use documentation of coding standard,
naming conventions, class libraries and abstraction.
3.6 Functional requirements:
As per the functional requirement specifications, the project shall provide following facilities
• The system collects the current status of the workload based on the RAM utilized by the
running application.
• Check the aging factor which degrades the availability to application. If any aging factor
detected then it will notify.
• The system collects the status of the system periodically.
• This system keeps track of the system time and it is compared with fixed rejuvenation
schedule. If the tracking time is equal to fixed rejuvenation schedule then the system
rejuvenated.
• This system stores the current status of the process; it is useful to again resume the
processor after system rejuvenation takes place.
3.7 Project Effort Estimation:
Assumptions:
Average Labor Cost : $680/month
Average Line of Code (LOC) : 450LOC/month
Average cost for a line of code : $1.5/LOC (680 / 450)
Modules Details:
 The Project contain 6 model each model contain around 490 loc/module in which
implementation consists of 320 loc/module and analysis consists of 170 loc/module.
 Total Project Size = 490 * 6
= 2940 loc
Cost Estimation:
 For one module, cost = 490 * 1.5
= $ 735
 Total cost of Project = 2940 * 1.5
= $ 4410
Effort Estimation:
 Effort = Total Project Cost
Average people Cost per month
= 4410 / 680
= 6.4852 ≈ 7 Persons/month
 7 Persons are required to complete this project in one month duration.
3.8 Project Scheduled:
Table 3.3 Required Schedules for each Task
Figure 3.1 Gantt chart of Project Schedule
CHAPTER 4
HIGH LEVEL DESIGN OF
SOFTWARE REJUVENATION IN COMPLEX SYSTEM
A software product is a complex entity. Its development usually follows what is known as
Software Development Life Cycle (SDLC). The second stage in the SDLC is the Design stage.
The objective of the design stage is to produce the overall design of the software. The design
stage involves two sub-stages namely:
• High-Level Design
• Detailed-Level Design
In the High-Level Design, the proposed functional and non-functional requirements of the
software are studied. Overall solution architecture of the solution is developed which can
handle those needs.
4.1 Development Methods:
The development method used in this software design is the modular/functional development
method. In this, the system is broken down into different modules, with a certain amount of
dependency among them. The input-output data that flows from one-module to another will
show the dependency.
Data flow diagrams have been used in the modular design of the system.
4.2 Data Flow Diagrams:
Data-flow models are an intuitive way showing how data is processed by a system. At the
analysis level, they should be used to model the way in which data is processed in the existing
system. The notation used in these models represents functional processing, data stores and
data movements between functions. Dataflow models are used to show how data flows
through a sequence of processing steps. The data is transformed at each step before moving on
to the next stage. These processing steps or transformation are program functions where
dataflow diagrams are used to document a software design
4.3 Data Flow Diagram:
4.3.1 Data Flow Diagram For rejuvenation Manager: Level 0
Figure 4.1 DFD: Level 0: module for Rejuvenation system
Rejuvenation
process
1.0
REJUVENATION
MANAGER
System
In these figure 4.1 Level 0 modules for rejuvenation describes about main rejuvenation process
with variable time and workload policy implemented. The different module is selected initially
here and later threshold time and threshold memory is set.
4.3.2 Data Flow Diagram for FTR and FTM: Level 1
Figure 4.2 DFD: Level 1 module for rejuvenation manager
In Figure 4.2, the Level1 data flow diagram describes about the working of rejuvenation
manager, rejuvenation manager has two modules namely aging detector and optimizer. Aging
detector detects the aging factor and invokes optimizer to optimize the rejuvenation time. If
aging is not detected then system is rejuvenated at rejuvenation time
System
1.1
1.2
Optimizer
Aging detector
User set Threshold values
Rejuvenation
Process
4.3.3 Data Flow Diagram: Level 2
Figure 4.3 Data Flow Diagram: Level 2 module for aging detector
System
1.1.1
Call Meminfo ()
1.1.2
Checking the
aging factor
Rejuvenation
process
System
Optimizing
FTR value
Compare FTR
with STM
Rejuvenati
on Process
1.2.
4
1.2.
1
1.2.
2
1.2.
3
Calculate
memory
factor
Analyze Current
FTR
Figure 4.4 Data Flow Diagram: Level 2 module for time optimizer
Aging detector detects the free memory left by calling meminfo( )and againg result is given to
rejuvenation manager. Rejuvenation manager compares the free memory left to threshold
value given by user, and then it calls the optimizer to optimize the rejuvenation time if
comparison results are positive. Optimizer fetches the FTR and STM (System Time) and checks
for the free memory left. Based on the threshold value, time is optimized, either increased or
decreased.
4.4 Sequence Diagram
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that
shows how processes operate with one another and in what order. It is a construct of a Message
Sequence Chart. Sequence diagrams are sometimes called Event-trace diagrams, event scenarios,
and timing diagrams
A sequence diagram shows, as parallel vertical lines ("lifelines"), different processes or objects
that live simultaneously, and, as horizontal arrows, the messages exchanged between them, in
the order in which they occur. This allows the specification of simple runtime scenarios in a
graphical manner.
In fig 4.5, clearly depicts the policy used in this project i.e. variable time and workload policy.
Rejuvenation manager request for status of workload applied on the system. System ping the
rejuvenation manager with workload applied on it, then the rejuvenation manager calls aging
detector [13] to compare with predetermined threshold value if any variations observed then
this result is given back to rejuvenation manager, later optimizer is invoked to optimize the
rejuvenation time, and system is allowed to rejuvenate to its optimized time. If no variation is
observed then system is allowed to rejuvenate at predetermined rejuvenation time.
Figure 4.5 Sequence Diagram for rejuvenation manager
4.5 Detailed Design
4.5.1 Detailed System Design
The main aim of the project is to build a simulator used to simulate the Time and Prediction
based rejuvenation approaches.
In this section, the individual modules that comprise the building blocks of the system are
identified and have presented a complete design for them. The details of the design process for
each module contains of the following elements:
• The purpose of the module
• A description of its functionality
• A description of the types and number of inputs it accepts
• A description of the types and number of outputs it generates
4.5.2 Module 1: OS cold reboot
This module is about OS cold reboot, in cold reboot process the rejuvenation time is entered by
user and this time is compared with system time, if there is any variation in workload compared
to threshold value given by user then time is optimized and system is rejuvenated at optimized
time without
Input
The input for the module is rejuvenation time and threshold value of memory.
Output
The output for the module is to rejuvenate at rejuvenation point
Figure 4.6 Functioning of OS cold reboot
Yes No
ART
Compar
e
Mem
_cC
Declare and retrieve time
Reboot
STOP
The functioning of the cold reboot is described in the above flow diagram.
The figure 4.6 shows the process of how OS cold reboot process works, initially user need to set
rejuvenation time and threshold memory value and next comparison of system time with
rejuvenation time given by user, if time is equal then system is rejuvenated immediately. If time
is not equal then memory usage is compared with threshold memory value in block mem_c, if
result is negative then system is rejuvenated if result is positive then time is optimized and
updated to rejuvenation time.
4.5.3 Module 2: Module for OS warm reboot process
This module is about OS warm reboot
Input
The input for the module is to set predetermined rejuvenation time and threshold memory
value.
Output
The output of the module is to save the state of the kernel as image and save it on hard disk
and rejuvenate at rejuvenation time later system must start with from previous reboot state.
The functioning of OS warm reboot is described in the following flow diagram.
Yes No
Yes
No
Start
Save
Rejuvenation
Stop
Optimize
Set FTR & FTM
FTR
che
ck
Check
FFM
Figure 4.7 Module of OS warm reboot
The figure 4.7 shows how OS warm reboot works. First user have to set the predetermined
rejuvenation time and threshold value of memory, next comparison of system time with
rejuvenation time given by user, if time is equal then kernel state is saved and stored in hard
disk and system is rejuvenated. If time is not equal then memory usage is compared with
threshold memory value in block check FFM (Fixed Free Memory) if result is negative then
system time is checked with rejuvenation time. If result is positive then time is optimized and
updated to rejuvenation time. System is rejuvenated at the rejuvenation time.
4.5.4 Module 3: VM cold reboot.
This module describe about VM cold reboot.
Input
The input for the module is to set predetermined rejuvenation time and threshold memory
value.
Output
Output for the module is to rejuvenate the VM at the rejuvenation time
The functioning of the VM cold reboot module is described in the following flow diagram.
Mem
_C
NoYes
START
Compa
re
Declare and retrieve time
Reboot
STOP
Yes No
Figure 4.8 Module for VM cold reboot
The figure 4.8 shows module for VM cold reboot is shown, initially user need to set
rejuvenation time and threshold memory value. Next, compare system time with rejuvenation
time given by user, if time is equal then VM is rejuvenated immediately. If time is not equal
then memory usage is compared with threshold memory value in block mem_c, if result is
positive then system is rejuvenated if result is negative then time is optimized and updated to
rejuvenation time.
4.5.5 Module 4: module for VM warm reboot
Input
The input for the module is to set predetermined rejuvenation time and threshold memory
value.
Output
The output of the module is to save the state of the fault VM’s kernel as image and save it on
hard disk and rejuvenate at rejuvenation time later VM must start from previous reboot state.
The functioning of the VM warm reboot module is described in the following flow diagram.
Figure 4.9 VM warm reboot Module
The figure 4.9 shows VM warm reboot process , First user have to set the predetermined
rejuvenation time and threshold value of memory, next comparison of system time with
Yes No
Optimize
START
Save
Rejuvenation
STOP
Set FTR and FTM FTM
Che
ck
FTR
Che
ck
FTR
Check
FTM
Check
FTM
Resume
Yes No
rejuvenation time given by user, if time is equal then kernel state is saved and stored in hard
disk and system is rejuvenated. If time is not equal then memory usage is compared with
threshold memory value in block named check FTM (Fixed Threshold Memory), if result is
negative then system time is checked with rejuvenation time. If result is positive then memory
usage is compared with peak memory value in block named check FTM, if result is positive then
kernel state is saved and stored in hard disk and system is rejuvenated, if result is positive then
time is optimized and updated to rejuvenation time. System is rejuvenated at the rejuvenation
time
4.5.6 Module 5: module for VM migration
This module describe about VM migration.
Input
The input for the module is to set predetermined rejuvenation time and threshold memory
value.
Output
Output for the module is to migrate the VM from server which is tending to fail to the another
server at the rejuvenation time
The functioning of the VM migrate module is described in the following flow diagram.
Figure 4.10 VM migration module
NoYes
Yes No
Optimize
Migrate
STOP
Set FTR and FTMSet
FTR and FTM
Che
ck
FTR
Che
ck
FTR
Check
FTM
Check
FTM
STARTSS
TART
No
Yes
The figure 4.10 shows flow chart of VM migration clearly depicts it working, initially admin need
to set the rejuvenation time and threshold memory value where the VM must be migrated,
here whatever the application running and dynamic data entered in VM will be migrated
successfully to the secondary server configured, so this module will provide most availability to
the server. Once when rejuvenation time is set and if heavy workload applied to the server in
mean time then the rejuvenation time is optimized so the server will be protected from
hang/crash failure. When rejuvenation time is reached the complete VM will be migrated to
another server configured, as data and application running are non-corrupted this module
provide no request failure and high availability, which is in great need to current corporate
world.
4.5.7 Module 6: Module for VMM reboot
This module describe about VMM reboot.
Input
The input for the module is to set predetermined rejuvenation time and threshold memory
value.
Output
Output for the module is to rejuvenate the VMM at the rejuvenation time
The functioning of the VMM reboot module is described in the following flow diagram.
Mem
_C
NoYes
START
Compa
re
Set FTR and FTM
Reboot
STOP
Yes No
Figure 4.11 VMM reboot model
The figure 4.11 shows the process of VMM reboot, initially user need to set rejuvenation time
and threshold memory value and next comparison of system time with rejuvenation time given
by user, if time is equal then system is rejuvenated immediately. If not then depending on the
workload system will optimize the rejuvenation time, at an optimized time VMM reboot takes
place. In this module before VMM reboot, all the VM’s running are shut down.
CHAPTER 5
IMPLEMENTATION OF
SOFTWARE REJUVENATION IN COMPLEX SYSTEM
The implementation phase of any project development is the most important phase as it yields
the final solution, which solves the problem at hand. The implementation phase involves the
actual materialization of the ideas, which are expressed in the analysis document and developed
in the design phase.
Project has six modules OS Cold reboot, OS Warm reboot, VM Cold reboot, VM Warm reboot,
VM Migration and VMM reboot, based on Time and Workload rejuvenation policies and also
analysis of all the modules are done using SPNP (Stochastic Petri Nets Package).
5.1 Platform Selection:
5.1.1 KVM/QEMU:
KVM (Kernel-based Virtual Machine) is a full virtualization solution for Linux on x86 hardware
containing virtualization extensions (Intel VT or AMD-V). It consists of a loadable kernel
module, kvm.ko that provides the core virtualization infrastructure and a processor specific
module, kvm-intel.ko or kvm-amd.ko. KVM also requires a modified QEMU although work is
underway to get the required changes upstream.
Using KVM, one can run multiple virtual machines running unmodified Linux or Windows
images. Each virtual machine has private virtualized hardware: a network card, disk, graphics
adapter, etc.
The kernel component of KVM is included in mainline Linux, as of 2.6.20.KVM is open source
software. A wide variety of guest operating systems work with KVM, including many flavours
of Linux, BSD, Solaris, Windows, Haiku,ReactOS, Plan 9, and AROS Research Operating
System. In addition Android 2.2, GNU/Hurd (Debian K16), Minix 3.1.2a, Solaris 10 U3, Darwin
8.0.1 and more OSs and some newer versions of these with limitations are known to work. A
modified version of QEMU can use KVM to run Mac OS X.
5.1.2 SPNP (Stochastic Petri Net Package) [12][13] [14]:
This package was developed by Ciardo et.al. The model type used for input is a SRN (Stochastic
Reward Net). SRNs incorporate several structural extensions to GSPNs such as marking
dependencies (marking dependent arc cardinalities, guards, etc.) and allow reward rates to be
associated with each marking. The reward function can be marking dependent as well.
They are specified using CSPL (C based SRN Language) which is an extension of the C
programing language with additional constructs for describing the SRN models. SRN
specifications are automatically converted into a Markov reward model which is then solved to
compute a variety of transient, steady-state, cumulative, and sensitivity measures. For SRNs with
absorbing markings, mean time to absorption and expected accumulated reward until absorption
can be computed.
The interface increases the power of SPNP (Stochastic Petri Net Package) [15] by providing a
means of rapidly developing stochastic reward nets (SRNs); the model type used for input. Input
to SPNP is specified using CSPL (C based SPN Language), but the interface removes this burden
from the user by providing an interface for graphical representation of the model.
The first interface was implemented with Tcl/Tk. Then JAVA was used develop the new version,
which makes the look and feel of the interface.
5.1.3 CentOS (OS selection)
The CentOS Linux distribution is a stable, predictable, manageable and reproducible platform
derived from the sources of Red Hat Enterprise Linux (RHEL). The process delivered has a clear
governance model, increased transparency and access.
Since March 2004, CentOS Linux has been a community-supported distribution derived from
sources freely provided to the public by Red Hat. As such, CentOS Linux aims to be functionally
compatible with RHEL. CentOS change packages to remove upstream vendor branding and
artwork. CentOS Linux is no-cost and free to redistribute.
CentOS Linux is developed by a small but growing team of core developers. In turn the core
developers are supported by an active user community including system administrators, network
administrators, managers, core Linux contributors, and Linux enthusiasts from around the world.
We adopt this OS because it is highly compatible and stable, it is very easy to install KVM and
configure it. Moreover configuring NFS is easy for beginners and ports can be resolved properly.
The forums of this OS had all the solutions to problems we have faced in other OS like Ubuntu,
fedora. Moreover it is open source and codes are available online.
5.2 Programming Language Used (Language Selection):
C is a general-purpose programming language initially developed by Dennis Ritchie. C is
an imperative (procedural) language. It was designed to be compiled using a relatively
straightforward compiler, to provide low-level access to memory, to provide language constructs
that map efficiently to machine instructions, and to require minimal run-time support. C was
therefore useful for many applications that had formerly been coded inassembly language, such
as in system programming.
Despite its low-level capabilities, the language was designed to encourage cross-
platform programming. A standards-compliant and portably written C program can be compiled
for a very wide variety of computer platforms and operating systems with few changes to its
source code. The language has become available on a very wide range of platforms, from
embedded microcontrollers to supercomputers.
Table 5.1 Methods used in code
Methods used in code Description
void meminfo(void) Used to check system free memory status.
FILE_TO_BUF(meminfo.file, memif_id) Used to store intermediate results of memory
status
stroul( ) Used to convert string to unsigned long integer
time( ) Used to get current system time. This function
return time_t type variable.
memcopy( ) Used to convert time_t struct variable totm_d
struct variable.
loacaltime( ) Used to fetch system local time.
Sizeof To get object size
System( ) This function is used to invoke system
command
fprintf( ) This function used to write to file.
fopen( ) This function is used to create a file
5.3 Installing and configuring KVM on cent OS
5.3.1 Check Hardware Virtualization support
KVM requires hardware virtualization support such as Intel VT or AMD's AMD-V, which
are instruction set extensions for hardware-assisted virtualization. Check if hardware
virtualization support is available on CentOS host machine:
$ egrep -i 'vmx|svm' --color=always /proc/cpuinfo
If CPU flags contain "vmx" or "svm", it means hardware virtualization support is
available.
5.3.2 Configure FQDN for local host
Configure FQDN (Fully Qualified Domain Name) for local host. Otherwise, you may get
warnings while launching libvirtd daemon such as "getaddrinfo failed for 'myhost': Name
or service not known".
To configure FQDN, edit the following configuration file:
$ sudo -e /etc/sysconfig/network
HOSTNAME=xxx.yyy
5.3.2.1 Disable SELinux
Before installing KVM, be aware that there are several SELinux Booleans that can
affect the behavior of KVM and libvirt. Here we set Selinux to 0 "Permissive" for
demonstration purpose. If you do not wish to change SELinux mode.
To disable SELinux on CentOS:
$sudo -e /etc/selinux/config
SELINUX=permissive
5.3.2.2 Reboot the machine for the change to take effect.
5.4 Install KVM, QEMU and user-space tools
To install KVM, QEMU and user-space tools use the following steps:
Step1: Install KVM and virtinst (a tool to create VMs) as follows:
$sudo yum install kvm libvirt python-virtinst qemu-kvm
Step2: Start libvirtd daemon, and set it to auto-start:
$sudo service libvirtd start
$sudo chkconfig libvirtd on
Step3: Check if KVM has successfully been installed. You should see no error as
follows.
$ sudo virsh -c qemu:///system list
Id Name State
----------------------------------------------------
Step4: Configure Linux Bridge for VM Networking
Installing KVM alone does not allow VMs to communicate with each other or access
external networks. You need to configure VM networking separately. Here, we set up
"bridged networking" via Linux Bridge.
 Install a package needed to create and manage bridge devices:
$sudo yum install bridge-utils
 Disable Network Manager Service if it's enabled, and switch to default net manager as
follows.
$sudo service NetworkManager stop
$sudo chkconfig NetworkManager off
$sudo chkconfig network on
$sudo service network start
To configure a new bridge, you have to pick an active network interface (e.g.,
eth0), and enslave it to the bridge. Depending on whether the network interface is
assigned an IP address via DHCP or statically, there are two different ways to
configure a new bridge.
 To configure bridge br0 via DHCP:
$sudo -e /etc/sysconfig/network-scripts/ifcfg-eth0
• Modify the file ifcfg-etho as shown below:
DEVICE=eth0
TYPE=Ethernet
ONBOOT=yes
NM_CONTROLLED=yes
BRIDGE=br0
$sudo -e /etc/sysconfig/network-scripts/ifcfg-br0
• Modify the file ifcfg-br0 as shown below:
DEVICE=br0
NM_CONTROLLED=yes
ONBOOT=yes
TYPE=Bridge
BOOTPROTO=dhcp
 You should now see br0 bridge interface with a proper IP address as follows.
$ifconfig
Step5: Install VirtManager
The final step is to install a desktop UI called VirtManager for managing virtual
machines (VMs) through libvirt.
To install VirtManager:
$ sudo yum install virt-manager libvirt qemu-system-x86 openssh-askpass
libcanberra-devel
5.5 Setup a minimal CentOS 6 NFS configuration
To setup an NFS (Network File System) configuration for two systems, basically we have
to consider one system as a server and another one as a client. The following steps show
the NFS Server configuration:
5.5.1 SERVER CONFIGURATION:
Step1: Checking for yum updates and installing NFS utils
• To setup the server: 172.16.30.48/255.255.255.254
• Before setup the server system needs update the packages:
"yum update"
• Once update is completed reboot the system.
"shutdown -r now"
• Install nfs-utils rpcbind system configuration package.
"yum install nfs-utils rpcbind system-config-firewall-tui"
• Modify the selinux file to disable SELINUX
"vi /etc/sysconfig/selinux" and set "SELINUX=disabled".
"setenforce 0"
Step2: Make a folder to be shared
In an NFS sharing we have to create folder, that folder is shared with the both the server and
client. That folder holds all the data which is transferred between server and client
Here we are creating and sharing a folder called image, in the below command we are giving
the path in which where that folder is present.
$ mkdir /var/lib/libvirt/images
Step3: Checking the configuration of nfs, nfslock, and rpcbind:
$ chkconfig nfs on
$ chkconfig nfslock on
$ chkconfig rpcbind on
Step4: Configure the firewall setting:
$ "system-config-firewall-tui"
Step5: Modify the exports file to add the shared storage to make live migration from
source to destination system
/var/lib/libvirt/images 172.16.30.48/255.255.255.254 (rw, sync, no_root_squash)
Step6: Modify the hosts.allow file by following lines:
$ sudo /etc/hosts.allow
mountd: 172.16.30.46/255.255.255.254
Step7: Modify the hosts.deny file by following lines
portmap:ALL
lockd:ALL
mountd:ALL
rquotad:ALL
statd:ALL
Step8: Restart the following services on Server machine once you completed all the
above steps:
$ sudo service rpcbind restart
$ sudo service nfs restart
$ sudo service nfslock restart
Once you finish serve configuration, immediately follow the client configuration.
To configure the NFS client follow the following steps:
5.5.2 CLIENT CONFIGURATION:
Step1: To Setup the Client: 172.16.30.46/255.255.255.254
• Before we setup the client, system need to be updated with other packages:
$ sudo yum update
• Once update is completed reboot the system.
$ sudo shutdown -r now
• Install nfs-utils rpc bind system configuration package.
$ sudo yum install nfs-utils rpcbind system-config-firewall-tui
• Modify the selinux file to disable SELINUX
$ sudo gedit /etc/sysconfig/selinux and set SELINUX=disabled
$ setenforce 0
Step2: Make a folder to be the mount point.
In an NFS sharing we have to create sharable folder, this folder is shared with the both the
server and client. This folder holds all the data which is transferred between server and client
Here we create and share a folder called image, in the below command we give the path in
which where that folder is present.
$ sudo mkdir /var/lib/libvirt/images
Step3: Start the following services
$ sudo chkconfig nfs on
$ sudo chkconfig nfslock on
$ sudo chkconfig rpcbind on
Step4: Restart the following services on Server machine once you completed all the
above steps:
$ sudo service rpcbind restart
$ sudo service nfs restart
$ sudo service nfslock restart
Once you finish the both server and client NFS configuration, we have to mount the folder which
is created during the NFS server and client configuration.
To mount a folder we have to use the following steps:
Step1: Append the following line to fstab file:
$ sudo gedit /etc/fstab
<Shared directory> <mount point> <type> <auto> 0 0
172.16.30.48://var/lib/libvirt/images /var/lib/libvirt/images nfs auto 0 0
172.16.30.48: Server name
172.16.30.48:/var/lib/libvirt/images: mount File
/var/lib/libvirt/images: Mounting point on client machine (172.16.30.46)
nfs : Type
Step2: Mount shared nfs file on client machine:
$ sudo mount -t nfs 172.16.30.48://var/lib/libvirt/images /var/lib/libvirt/images.
5.6 ITL Algorithm.
ITL algorithm is designed to optimize the rejuvenation time predefined by user when workload is
variable. Rejuvenation time is decreased when workload increases and rejuvenation time is
increased when workload decreases.
Working of algorithm is described in below steps
Step 1: Begin
Step 2: Set variable FTR (Fixed Time Rejuvenation)
Step 3: Fetch the system Free Memory and assign to variable FM (Free Memory)
Step 4: Set the Threshold Free Memory value to variable FFM (Fixed Free Memory)
Step 5: if (FTR==SystemCurrentTime)
Then Reboot
Else
If (FM < FFM) then
Reset the FTR= FTR-(1*(FM-FFM))
Step 6: Go to Step 5
Step 7: End
CHAPTER 6
TESTING OF SOFTWARE REJUVENATION IN
COMPLEX SYSTEM
6.1 Testing
There are essentially three main domain and six modules in our project. In this section the results
of all six modules are being tested with different OS, VM or VMM. The purpose of this section
is to ensure that the resulting system meets the system requirements and there is a seamless
transition of data flowing through each of the systems as well as in between one another.
These testing provide a sort of "living document". Clients and other developers looking to learn
how to use the module can look at these tests to determine how to use the module to fit their
needs and gain a basic understanding of the modules.
6.1.1 Testing Strategy
The following points are indicative of the testing strategy for unit testing followed in the project.
• Review the design specifications and source code for modules to be tested.
• Perform a peer review on the module Test Plan.
• Create any test "stubs" required to provide input to or receive output from the code
module.
• When it's time to test particular modules, compile the code in the test environment to
check for any missing files required for test plan execution.
• Execute the tests. Compare information/values received out of the tested software to
those expected, as documented in the Test Plan.
• Retest code when an updated version is available. Record results on the module Test
Report Form.
• When the module is considered to have passed all tests, archive the final Report form(s).
Table 6.1: Cold reboot based on Time
Table 6.2: Cold reboot based on Workload
Test Case ID T-2
Test Case ID T-1
Purpose The system should rejuvenate at given rejuvenation time(TTR)
Pre-
Conditions System time
Inputs Time to Rejuvenate(TTR)
Expected
Output Reboot
Post-
Conditions After rebooting the system the current state should not be saved
Execution History
Date Result Version Remark
17-02-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
15-03-2014 Pass 1.0 Testing passed in CentOS operating system
27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
04-04-2014 Pass 1.0 Testing passed in CentOS operating system
09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
09-04-2014 Pass 1.0 Testing passed in CentOS operating system
Purpose System should rejuvenate at given memory threshold value
Pre-
Conditions System free memory
Inputs Memory threshold value
Expected
Output Reboot
Post-
Conditions After rebooting the system the current state should not be saved
Execution History
Date Result Version Remark
17-02-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
15-03-2014 Pass 1.0 Testing passed in CentOS operating system
27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
04-04-2014 Pass 1.0 Testing passed in CentOS operating system
09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
09-04-2014 Pass 1.0 Testing passed in CentOS operating system
.
Table 6.3: Cold reboot based on both Time and Workload
Test Case ID T-3
Purpose
The system optimize the rejuvenation time based on the workload and then system
rejuvenates at an optimized time
Pre-
Conditions System time and Free memory
Inputs Time to Rejuvenate(TTR) and Memory threshold value
Expected
Output Reboot
Post-
Conditions After rebooting the system the current state should not be saved
Execution History
Date Result Version Remark
17-02-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
15-03-2014 Pass 1.0 Testing passed in CentOS operating system
27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
04-04-2014 Pass 1.0 Testing passed in CentOS operating system
09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
09-04-2014 Pass 1.0 Testing passed in CentOS operating system
Table 6.4: Warm reboot based on Time
Test Case ID T-4
Purpose The system should rejuvenate at given rejuvenation time(TTR)
Pre-
Conditions System time
Inputs Time to Rejuvenate(TTR)
Expected
Output Reboot
Post-
Conditions After rebooting the system the current state should be saved
Execution History
Date Result Version Remark
27-02-2014 Failed 1.0
Testing Failed in CentOS operating system due to
OS is not compatible
28-03-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04
27-03-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04
04-04-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04
09-04-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04
09-04-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04
Table 6.5: Cold reboot based on Workload
Test Case ID T-5
Purpose The system should rejuvenate at given Memory threshold value
Pre-
Conditions System Free memory
Inputs Memory threshold value
Expected Reboot
Output
Post-
Conditions After rebooting the system the current state should be saved
Execution History
Date Result Version Remark
27-02-2014 Failed 1.0
Testing Failed in CentOS operating system due to
OS is not compatible
28-03-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04
27-03-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04
04-04-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04
09-04-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04
09-04-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04
Table 6.6: Warm reboot based on both time and Workload
Test Case ID T-6
Purpose
The system optimize the rejuvenation time based on the workload and then system
rejuvenates at an optimized time
Pre-
Conditions System Time and Free memory
Inputs Time to Rejuvenate and Memory threshold value
Expected
Output Reboot
Post-
Conditions After rebooting the system the current state should be saved
Execution History
Date Result Version Remark
27-02-2014 Failed 1.0
Testing Failed in CentOS operating system due to
OS is not compatible
28-03-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04
27-03-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04
04-04-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04
09-04-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04
09-04-2014 Pass 1.0.1 Testing is passed in Ubuntu 12.04
Table 6.7: VM cold reboot based on Time
Test Case ID T-7
Purpose The Virtual Machine(VM) should rejuvenate at given rejuvenation time(TTR)
Pre-
Conditions System time
Inputs Time to Rejuvenate(TTR)
Expected
Output Virtual Machine(VM) Reboot
Post-
Conditions After rebooting the Virtual Machine(VM) the current state should not be saved
Execution History
Date Result Version Remark
27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
04-04-2014 Pass 1.0 Testing passed in CentOS operating system
09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
09-04-2014 Pass 1.0 Testing passed in CentOS operating system
Table 6.8: VM cold reboot based on Workload
Test Case ID T-8
Purpose The Virtual Machine(VM) should rejuvenate at given Memory threshold value
Pre-
Conditions System Free memory
Inputs Memory threshold value
Expected
Output Virtual Machine(VM) Reboot
Post-
Conditions After rebooting the Virtual Machine(VM) the current state should not be saved
Execution History
Date Result Version Remark
27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
04-04-2014 Pass 1.0 Testing passed in CentOS operating system
09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
09-04-2014 Pass 1.0 Testing passed in CentOS operating system
Table 6.9: VM cold reboot based on both Time and workload
Test Case ID T-9
Purpose
The system optimize the rejuvenation time based on the workload and then Virtual
Machine(VM) rejuvenates at an optimized time
Pre-
Conditions System time and Free memory
Inputs Time to Rejuvenate(TTR) and Memory threshold value
Expected
Output Virtual Machine(VM) Reboot
Post-
Conditions After rebooting the Virtual Machine(VM) the current state should not be saved
Execution History
Date Result Version Remark
27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
04-04-2014 Pass 1.0 Testing passed in CentOS operating system
09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
09-04-2014 Pass 1.0 Testing passed in CentOS operating system
Table 6.10: VM warm reboot based on Time
Test Case ID T-10
Purpose The Virtual Machine(VM) should rejuvenate at given rejuvenation time(TTR)
Pre-
Conditions System time
Inputs Time to Rejuvenate(TTR)
Expected
Output Virtual Machine(VM) Reboot
Post-
Conditions After rebooting the Virtual Machine(VM) the current state should be saved
Execution History
Date Result Version Remark
27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
04-04-2014 Pass 1.0 Testing passed in CentOS operating system
09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
09-04-2014 Pass 1.0 Testing passed in CentOS operating system
Table 6.11: VM warm reboot based on Workload
Test Case ID T-11
Purpose The Virtual Machine(VM) should rejuvenate at given Memory threshold value
Pre-
Conditions System Free memory
Inputs Memory threshold value
Expected
Output Virtual Machine(VM) Reboot
Post-
Conditions After rebooting the Virtual Machine(VM) the current state should be saved
Execution History
Date Result Version Remark
27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
04-04-2014 Pass 1.0 Testing passed in CentOS operating system
09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
09-04-2014 Pass 1.0 Testing passed in CentOS operating system
Table 6.12: VM warm reboot based on both Time and workload
Test Case ID T-12
Purpose
The system optimize the rejuvenation time based on the workload and then Virtual
Machine(VM) rejuvenates at an optimized time
Pre-
Conditions System time and Free memory
Inputs Time to Rejuvenate(TTR) and Memory threshold value
Expected
Output Virtual Machine(VM) Reboot
Post-
Conditions After rebooting the Virtual Machine(VM) the current state should be saved
Execution History
Date Result Version Remark
27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
04-04-2014 Pass 1.0 Testing passed in CentOS operating system
09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
09-04-2014 Pass 1.0 Testing passed in CentOS operating system
Table 6.13: VMM reboot based on Time
Test Case ID T-13
Purpose
The Virtual Machine Monitor(VMM) should rejuvenate at given rejuvenation
time(TTR)
Pre-
Conditions System time
Inputs Time to Rejuvenate(TTR)
Expected
Output Virtual Machine Monitor(VMM) Reboot
Post-
Conditions
After rebooting the Virtual Machine Monitor(VMM) connection between VMM
and VM’s should loss
Execution History
Date Result Version Remark
27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
04-04-2014 Pass 1.0 Testing passed in CentOS operating system
09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
09-04-2014 Pass 1.0 Testing passed in CentOS operating system
Table 6.14: VMM reboot based on Workload
Test Case ID T-14
Purpose
The Virtual Machine Monitor(VMM) should rejuvenate at given Memory
threshold value
Pre-
Conditions System Free memory
Inputs Memory threshold value
Expected
Output Virtual Machine Monitor(VMM) Reboot
Post-
Conditions
After rebooting the Virtual Machine Monitor(VMM) connection between VMM
and VM’s should loss
Execution History
Date Result Version Remark
27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
04-04-2014 Pass 1.0 Testing passed in CentOS operating system
09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
09-04-2014 Pass 1.0 Testing passed in CentOS operating system
Table 6.15: VMM reboot based on both Time and Workload
Test Case ID T-15
Purpose
The system optimize the rejuvenation time based on the workload and then Virtual
Machine Monitor(VMM) rejuvenates at an optimized time
Pre-
Conditions System time and Free memory
Inputs Time to Rejuvenate(TTR) and Memory threshold value
Expected Virtual Machine Monitor(VMM) Reboot
Output
Post-
Conditions
After rebooting the Virtual Machine Monitor(VMM) connection between VMM
and VM’s should loss
Execution History
Date Result Version Remark
27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
04-04-2014 Pass 1.0 Testing passed in CentOS operating system
09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
09-04-2014 Pass 1.0 Testing passed in CentOS operating system
Table 6.16: VM migration based on Time
Test Case ID T-16
Purpose
The Virtual Machine(VM) should migrate from one Physical Machine(PM1) to
another Physical Machine(PM2) at given rejuvenation time(TTR)
Pre-
Conditions System time
Inputs Time to Rejuvenate(TTR)
Expected
Output
Virtual Machine(VM) should migrate from one Physical Machine(PM1) to
another Physical Machine(PM2)
Post-
Conditions
After the migration Virtual Machine(VM) from Physical Machine(PM1) should
reboot
Execution History
Date Result Version Remark
27-02-2014 Failed 1.0
Testing Failed in Ubuntu 12.04 operating system
due to OS is not compatible
05-03-2014 Failed 1.0
Testing Failed in Ubuntu 12.04 operating system
due to OS is not compatible
10-03-2014 Failed 1.0
Testing Failed in Ubuntu 12.04 operating system
due to OS is not compatible
30-03-2014 Pass 1.0.1 Testing passed in CentOS operating system
04-04-2014 Pass 1.0.1 Testing passed in CentOS operating system
09-04-2014 Pass 1.0.1 Testing passed in CentOS operating system
Table 6.17: VM migration based on workload
Test Case ID T-17
Purpose
The Virtual Machine(VM) should migrate from one Physical Machine(PM1) to
another Physical Machine(PM2) at given Memory threshold value
Pre-
Conditions System Free memory
Inputs Memory threshold value
Expected
Output
Virtual Machine(VM) should migrate from one Physical Machine(PM1) to
another Physical Machine(PM2)
Post-
Conditions
After the migration Virtual Machine(VM) from Physical Machine(PM1) should
reboot
Execution History
Date Result Version Remark
27-02-2014 Failed 1.0
Testing Failed in Ubuntu 12.04 operating system
due to OS is not compatible
05-03-2014 Failed 1.0
Testing Failed in Ubuntu 12.04 operating system
due to OS is not compatible
10-03-2014 Failed 1.0
Testing Failed in Ubuntu 12.04 operating system
due to OS is not compatible
30-03-2014 Pass 1.0.1 Testing passed in CentOS operating system
04-04-2014 Pass 1.0.1 Testing passed in CentOS operating system
09-04-2014 Pass 1.0.1 Testing passed in CentOS operating system
Table 6.18: VM migration based on both Time and workload
Test Case ID T-18
Purpose
The system optimize the rejuvenation time based on the workload and then Virtual
Machine(VM) should migrate from one Physical Machine(PM1) to another
Physical Machine(PM2) based on optimized time
Pre-
Conditions System Free memory and System Free memory
Inputs System Time and Memory threshold value
Expected
Output
Virtual Machine(VM) should migrate from one Physical Machine(PM1) to
another Physical Machine(PM2)
Post-
Conditions
After the migration Virtual Machine(VM) from Physical Machine(PM1) should
reboot
Execution History
Date Result Version Remark
27-02-2014 Failed 1.0
Testing Failed in Ubuntu 12.04 operating system
due to OS is not compatible
05-03-2014 Failed 1.0
Testing Failed in Ubuntu 12.04 operating system
due to OS is not compatible
10-03-2014 Failed 1.0
Testing Failed in Ubuntu 12.04 operating system
due to OS is not compatible
30-03-2014 Pass 1.0.1 Testing passed in CentOS operating system
04-04-2014 Pass 1.0.1 Testing passed in CentOS operating system
09-04-2014 Pass 1.0.1 Testing passed in CentOS operating system
CHAPTER 7
RESULTS OF SOFTWARE REJUVENATION IN
COMPLEX SYSTEM
7.1 Results
ITL algorithm implemented on all modules is analyzed using SPNP, which help to get the value
of MTTR and MTTF, from these values, we calculate the availability and downtime factor for
particular algorithm implied to the system. Availability value is found out for all modules and
based on these values we can analyze how much time the system will be available for usage
without any failure.
In SPNP we need to develop a petri net diagram for particular algorithm and for this diagram we
are supposed to code in CSPL (C language based on stochastic petri net) to define the transition
of tokens from one place to another through timed transitions or immediate transitions. Token
are deposited in place and are transmitted from one place to another by timed or immediate
transitions.
To check that petri net diagram is having proper flow, SPNP provide the animation option where
we are supposed to code for guiding token transitions, when and where to move i.e. from one
place to another place. when the code is executed the animated petri net diagram will show how
the transition are taking place , if any error occur during this animated transition then it is clear
that the algorithm or petri net diagram for that algorithm is error prone.
Table 7.1: symbol conventions.
Figure 7.1: Memory model Figure7.2: Clock model
Table 7.2: Clock and Memory SRN model description
Places &
Transitions
Description
Pclock Place where clock is initialized or reset.
Ptpolicy This place indicates the rejuvenation time is reached when token is
present in it.
Symbol Conventions
Place
Timed transition
Immediate transition
Arc
Tpolicy
Ttrigger
Pclock
Ptrigger
Ptpolicy
Tclock
Ptpolicy
Tmem
Tmemv
Pmem
Ptrigger This place is point for rejuvenation.
Pmem Place where RAM utilization is compared with threshold value
predefined.
Pmemv Place which indicates RAM utilization reached its threshold point
Tclock Timed transition, it is enabled when the given time is reached
Tpolicy Timed transition, it is enabled when the token is present in Ptpolicy
Ttrigger Immediate transition, it is enabled when the token is present in Ptrigger
and if the given time is reached.
Tmem Timed transition, it is enabled when the given time is reached
Tmemv Immediate transition, it is enabled when the token is present in Ptpolicy
and if the given time is reached.
Figure 7.3: OS Cold SRN model.
Table 7.3: OS Cold SRN model description
Trej
tarej
Working
Twork
Rej
Taginig
aging
Figure 7.4: OS Warm SRN model
Table 7.4: OS Warm SRN model description
Places & Description
Places &
Transitions
Description
Working Place which indicates system is in normal working state.
Aging Place which indicates system is suffering from aging problem.
Rej Place which indicates system is under rejuvenation process.
Trej Immediate transition, it is enabled when the given time is reached
Twork Timed transition, it is enabled when the token is present in Rej
Taging Timed transition, it is enabled when the token is present in working and
Pmemv.
tarej Immediate transition, it is enabled when the token is present in aging.
Tsave
Taging
Working
Toptiwork
optimize
Toptiaging
saveTrejRej
Tresume
Treworking
Resume
Transitions
Working Place which indicates system is in normal working state.
Aging Place which indicates system is suffering from aging problem.
Rej Place which indicates system is under rejuvenation process.
Optimize Place where time is optimized based on workload.
Save Place where image of kernel is created and saved.
Resume Place where kernel image saved is retrieved.
Taging Timed transition, it is enabled when the token is present in working and
Pmemv.
Topti Immediate transition, it is enabled when the token is present in aging
Toptiwork Timed transition, it is enabled when the token is present in optimize.
Tsave Timed transition, it is enabled when the token is present in working and
Ptpoicy.
Trej Immediate transition, it is enabled when the token is present in save.
Tresume Timed transition, it is enabled when the token is present in rej.
Trewoking Immediate transition, it is enabled when the token is present in resume.
Figure 7.5: VM cold SRN model
Table 7.5: VM Cold SRN model description
T_opt Optimize
T_aging
AgingRejuvenation
T_rej
Places &
Transitions
Description
Working Place which indicates system is in normal working state.
Aging Place which indicates system is suffering from aging problem.
Rejuvenation Place which indicates system is under rejuvenation process.
Optimize Place where time is optimized based on workload.
Taging Timed transition, it is enabled when the token is present in working and
Pmemv.
Trej Timed transition, it is enabled when the token is present in Ptpolicy and if the
given time is reached.
T_aging Timed transition, it is enabled when the token is present in aging.
T_opt Timed transition, it is enabled when the token is present in Optimize.
T_rej Timed transition, it is enabled when the token is present in Rejuvenation.
Tmem
Aging
Taging
Optimize
Topt
Working
Tsave
Tsave
SaveTrej
Rejuvenation
Resume
Tres
Tmem
Working
Aging
Taging
Optimize
Topt
Working
Tsave
Tsave
SaveTrej
Rejuvenation
Resume
Tres
Tmem
Aging
Taging
Optimize
Topt
Working
Tsave
Tsave
SaveTrej
Rejuvenation
Resume
Tres
Memory
Figure 7.6 VM warm SRN model
Table 7.6: VM Warm SRN model description
Places &
Transitions
Description
Working Place which indicates system is in normal working state.
Aging Place which indicates system is suffering from aging problem.
Rejuvenation Place which indicates system is under rejuvenation process.
Optimize Place where time is optimized based on workload.
Save Place where image of VM is created and saved.
Resume Place where VM image saved is retrieved.
Memory Place where indicates the vary in memory
Taging Timed transition, it is enabled when the token is present in aging.
Tmem Timed transition, it is enabled when the token is present in memory.
Tsave Timed transition, it is enabled when the token is present in Ptpolicy and if
the given time is reached.
T_save Timed transition, it is enabled when the token is present in memory and
pmemv==2.
Trej Timed transition, it is enabled when the token is present in save.
Tres Timed transition, it is enabled when the token is present in Rejuvenation.
Twmem Timed transition, it is enabled when the token is present in Pmemv and if the
given time is reached.
Trevert Timed transition, it is enabled when the token is present in Resume.
Topt Timed transition, it is enabled when the token is present in optimize.
Tmem
Aging
Taging
Optimize
Topt
Working
Tsave
Tsave
SaveTrej
Rejuvenation
Resume
Tres
T
afail
Tnormal
agingfailed
T
afail
aging
T
aging
migrate
T
migrate
Working 1
T
rej1
T
rejre2
Rej
T
rejre1
T
rej2
T
revert
Working 2
T
hyper
Figure 7.7 VM migration SRN model
Table 7.7: VM Migration SRN model description
Places & Transitions Description
Working1 Place which indicates system is in normal working state.
Working2 Place which indicates system is in normal working state.
Aging Place which indicates system is suffering from aging problem.
Rej Place which indicates system is under rejuvenation process.
Optimize Place where time is optimized based on workload.
migrate Place which indicates VM is getting migrated
Tmaging Timed transition, it is enabled when the token is present in aging.
Tmigrate Timed transition, it is enabled when the token is present in working1.
Trej1 Timed transition, it is enabled when the token is present in working1==1 and
if the given time is reached.
Trej2 Timed transition, it is enabled when the token is present in working1 and
pclock and if the given time is reached.
Trejre1 Timed transition, it is enabled when the token is present in working1 and rej.
Trejre2 Timed transition, it is enabled when the token is present in rej and
working2==0.
Trevert Timed transition, it is enabled when the token is present in Working2==2.
Tnormal Timed transition, it is enabled when the token is present in optimize.
Taging Timed transition, it is enabled when the token is present in Pmemv and if the
given time is reached.
Tafail Timed transition, it is enabled when the token is present in aging.
Thyper Timed transition, it is enabled when the token is present in migrate.
7.2 Discussion
On developing above petri net diagram in SPNP and coding in CSPL for transition of token, help us
analyze the availability value for each module. On giving the transition time for transition to happen and
transition time took in real-time implementation to move from one state to another state, based on
values in Table 7.9 MTTR and MTTF value can be calculated. From these values availability of the module
can be calculated from the formula below
Availability = MTTR ÷ (MTTR + MTTF).
Availability for all the modules are analyzed in this project and their respective availability values are
calculated on an average for 30 days Table 7.8. From the availability value of 30 days we can calculate
availability of the system for any number of years. For all token to move from one place to another,
need to pass through the transition by accepting all guard function conditions. Table 7.9 has three
parameters namely transition, and value is time for particular transition to take place and mean value
gives the value in terms of 1/hour. Mean value is used as standard format of time in analysis using SPNP
Table 7.8: Availability values of rejuvenation methods
We have considered many key parameters like aging rate, rejuvenation rate, aging rate, failure rate,
suspend rate, resume rate, restart rate etc. and assumed safety thresholds for each of modules as given
in Table 7.9. Based on these values we detect the availability of the system using Time and variable
workload policy. In all modules we just set the rejuvenation time for their safe levels at a certain interval
of time and set the threshold memory value and if aging is detected then the rejuvenation time is
optimized by ITL algorithm and on that optimized time rejuvenation occurs.
Table 7.9: Cold OS rejuvenation transition rates
Rejuvenation Methods
Days
Steady State Availability Downtime
Cold OS Rejuvenation
Warm OS Rejuvenation
Cold VM Rejuvenation
Cold VMM Rejuvenation
Warm VM Rejuvenation
VM Migration
30
30
30
30
30
30
0.998824
0.998983
0.998633
0.998846
0.998799
0.999219
0.001176
0.001017
0.001367
0.001154
0.001201
0.000781
Transition Value Mean time
OS aging rate
OS Rejuvenation rate
OS Failure rate
OS Suspend rate
OS Restart rate
VM Resume rate
VM rejuvenation rate
VM aging rate
VM failure rate
VM failure recovery rate
1 week
1 month
1 week
1 month
30 sec
15sec
1 month
1 week
1 week
1 min
0.005952381
0.001388889
0.005952381
0.001388889
120
240
0.001388889
0.005952381
0.005952381
60
Graphs are plotted based on transition rate and availability value. Graphs clearly depicts the
availability value at particular time, all graphs are plotted for thirty days interval.
All the graphs below have X-axis as availability value and Y- axis as time (1/hour). In general in
all modules, if rate of rejuvenation is high then the system will be rebooted repeatedly in short
intervals which lead to high downtime and hence availability value is low initially in all graphs
plotted.
Graph 1: OS Cold availability
In cold OS reboot process availability factor is low as system takes more time to reboot, hence
we have high downtime. In this module the system is restarted normally at rejuvenation time, for
this process downtime depends on the processor speed of the system, normally it might take
average of one to three minutes to get back to normal working state. Availability value of this
module is 0.998824 thirty days
Graph 2: OS Warm reboot
In OS warm reboot module availability value when compared to cold reboot module is high,
because here complete kernel is saved as image and stored on hard disk, after reboot grub loader
extract this image and kernel image will be loaded. Hence we provide no loss of data and no
interruption of applications running even after reboot. Availability value of this module is
0.998983 for 30 days
Graph 3: OS Warm reboot and Cold reboot comparison
Comparison graph give us the variation of availability value in cold and warm reboot of OS, initially both
modules have same availability value due to high rejuvenation rate which have less availability value,
the graph clearly depicts warm reboot of OS has high availability compared to cold reboot of OS.
Graph 4: VM Cold reboot
In VM cold reboot, again the availability value decreases as rebooting the system takes much time and
therefore it provide the low availability to the user using the system and chances of losing the data and
request failure is high. This module has 0.998633 availability value for thirty days.
Graph 5: VMM reboot
Again this module has same fault as it was in other cold reboot processes and hence it has
availability value of 0.998846 for thirty days.
Graph 6: comparison graph for VMM and VM reboot
Comparing cold reboot module of VM and VMM. We have better availability value for VMM
Cold reboot module. Both module give almost same availability to the system.
Graph 7: comparison graph for OS and VM cold reboot
From the graph we clearly come to know that VM cold reboot module has better availability
value when compared to OS cold reboot module.
Graph 8: Graph for VM warm reboot
As similar to OS warm reboot, VM warm reboot as high availability compared to cold reboot
modules, availability value of this module is 0.998799
Graph 9: Graph for VM migration
VM migration module has availability value of 0.999219 which is highest of all modules done in
this project, in this module as no reboot and no images are saved but the complete virtual
machine is migrated to another server conFigureured, hence it has no data loss and no request
failure or error in running applications.
Graph 10: Graph for VM comparison
This graph gives the comparison result of all modules in VM. VM migration module has high
availability and VM cold has lowest availability.
GRAPH 11: Comparison graph of all modules
This graph is main graph of our analysis, which has comparison of availability value of all
modules with respect to rejuvenation time, comparing availability value of all modules clearly
tell us that VM migration module has high availability and OS cold reboot module has very low
availability.
7.3 Snapshots:
7.3.1 Snap shots of OS Cold Reboot
Snap shot: 1 Snap shot: 2
Snap shot: 3 Snap shot: 4
7.3.2 Snap shots of OS Warm Reboot
Snap shot: 5 Snap shot: 6
Snap shot: 7 Snap shot: 8
7.3.3 Snap shots of VM Cold Reboot
Snap shot: 9 Snap shot: 10
Snap shot: 11 Snap shot: 12
Snap shot: 13
7.3.4 Snap shots of VM Warm Reboot
Snap shot: 14 Snap shot: 15
Snap shot: 16 Snap shot: 17
Snap shot: 18
7.3.5 Snap shots of VM Migration
Snap shot: 19 Snap shot: 20
Snap shot: 21 Snap shot: 22
Snap shot: 23 Snap shot: 24
7.3.6 Snap shots of VMM Reboot
Snap shot: 25 Snap shot: 26
Snap shot: 27 Snap shot: 28
Snap shot: 29
CHAPTER 8
CONCLUSION
8.1 Conclusion
Intelligent Time and Load (ITL) balancing policy accepts time from user and optimize the
rejuvenation time whenever workload is variable, otherwise the system is rejuvenated at its
rejuvenation point. ITL policy avoids software failure and it helps to achieve high availability of
complex system. ITL policy is used in experimenting on six module namely OS Cold Reboot,
OS Warm Reboot, VM Cold Reboot, VM Warm Reboot, VM Migration and VMM Reboot.
Over the course of experiment VM Migration achieves the best study state availability as long as
VM live migration is fast enough and other server have capacity to receive the migrated VM. In
the existing policy rejuvenation is proposed based on various parameters such as hardware
failures, memory leaks, CPU utilization, request failures and so on. ITL policy considers
Physical memory as a primary factor for rejuvenation, hence it is better way to avoid
performance degradation and to increase the availability of system.
8.2 Limitations
Some important limitations are as follows:
• Project is restricted to linux platform.
• OS Warm reboot is compatibilty on Ubuntu but not with other operating systems.
• VM migration is compatible on CentOS but not with other operating systems.
• Complete(100%) availabilty is not provided in all modules.
8.3 Future Enhancement
The basic idea is to accomplish request processing on the same node in which the rejuvenation
is taking place. The combination of reboot and failover, enables a system to continue
processing requests during the reboot. Before rebooting an OS or an application running on
one node of the clustered environment, requests to the node are redirected to the other nodes of
the system. This technique improves availability of systems.
Figure 8.1 Sharing of request
Figure 8.1 describe the simultaneous execution of request processing and rejuvenation on the
same node requires an alternative request processing environment. The alternative
environment takes over processing of all requests from the original environment, and then the
rejuvenation of the original environment is started. Hence by this 100% availability can be
achieved.
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems
Optimal Software Rejuvenation for Complex Systems

Más contenido relacionado

La actualidad más candente

Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance Systemprakashjjaya
 
Proactive cloud service assurance framework for fault remediation in cloud en...
Proactive cloud service assurance framework for fault remediation in cloud en...Proactive cloud service assurance framework for fault remediation in cloud en...
Proactive cloud service assurance framework for fault remediation in cloud en...IJECEIAES
 
Critical System Specification in Software Engineering SE17
Critical System Specification in Software Engineering SE17Critical System Specification in Software Engineering SE17
Critical System Specification in Software Engineering SE17koolkampus
 
Ch14-Software Engineering 9
Ch14-Software Engineering 9Ch14-Software Engineering 9
Ch14-Software Engineering 9Ian Sommerville
 
Components in real time systems
Components in real time systemsComponents in real time systems
Components in real time systemsSaransh Garg
 
Software Security Engineering
Software Security EngineeringSoftware Security Engineering
Software Security EngineeringMuhammad Asim
 
OS introduction
OS introductionOS introduction
OS introductionanand hd
 
Fault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemFault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemanujos25
 
Ch15-Software Engineering 9
Ch15-Software Engineering 9Ch15-Software Engineering 9
Ch15-Software Engineering 9Ian Sommerville
 
Real time operating-systems
Real time operating-systemsReal time operating-systems
Real time operating-systemskasi963
 
Structure of Operating System
Structure of Operating System Structure of Operating System
Structure of Operating System anand hd
 
OS Memory Management
OS Memory ManagementOS Memory Management
OS Memory Managementanand hd
 
OS-Process Management
OS-Process ManagementOS-Process Management
OS-Process Managementanand hd
 
OS scheduling
OS schedulingOS scheduling
OS schedulinganand hd
 

La actualidad más candente (18)

Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance System
 
Proactive cloud service assurance framework for fault remediation in cloud en...
Proactive cloud service assurance framework for fault remediation in cloud en...Proactive cloud service assurance framework for fault remediation in cloud en...
Proactive cloud service assurance framework for fault remediation in cloud en...
 
Critical System Specification in Software Engineering SE17
Critical System Specification in Software Engineering SE17Critical System Specification in Software Engineering SE17
Critical System Specification in Software Engineering SE17
 
Ch14-Software Engineering 9
Ch14-Software Engineering 9Ch14-Software Engineering 9
Ch14-Software Engineering 9
 
Components in real time systems
Components in real time systemsComponents in real time systems
Components in real time systems
 
Software Security Engineering
Software Security EngineeringSoftware Security Engineering
Software Security Engineering
 
OS introduction
OS introductionOS introduction
OS introduction
 
Fault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemFault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating system
 
Ch15-Software Engineering 9
Ch15-Software Engineering 9Ch15-Software Engineering 9
Ch15-Software Engineering 9
 
Real time operating-systems
Real time operating-systemsReal time operating-systems
Real time operating-systems
 
Structure of Operating System
Structure of Operating System Structure of Operating System
Structure of Operating System
 
OS Memory Management
OS Memory ManagementOS Memory Management
OS Memory Management
 
Ch14 resilience engineering
Ch14 resilience engineeringCh14 resilience engineering
Ch14 resilience engineering
 
RTOS
RTOSRTOS
RTOS
 
OS-Process Management
OS-Process ManagementOS-Process Management
OS-Process Management
 
OS scheduling
OS schedulingOS scheduling
OS scheduling
 
Rtos slides
Rtos slidesRtos slides
Rtos slides
 
111 118
111 118111 118
111 118
 

Similar a Optimal Software Rejuvenation for Complex Systems

Performance testing methodologies
Performance testing methodologiesPerformance testing methodologies
Performance testing methodologiesDhanunjay Rasamala
 
Review Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software FaultReview Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software FaultAM Publications
 
Restoration and Degeneration of the Applications
Restoration and Degeneration of the ApplicationsRestoration and Degeneration of the Applications
Restoration and Degeneration of the Applicationsiosrjce
 
Software maintaince.pptx
Software maintaince.pptxSoftware maintaince.pptx
Software maintaince.pptxAmarYa2
 
AN INVESTIGATION OF THE MONITORING ACTIVITY IN SELF ADAPTIVE SYSTEMS
AN INVESTIGATION OF THE MONITORING ACTIVITY IN SELF ADAPTIVE SYSTEMSAN INVESTIGATION OF THE MONITORING ACTIVITY IN SELF ADAPTIVE SYSTEMS
AN INVESTIGATION OF THE MONITORING ACTIVITY IN SELF ADAPTIVE SYSTEMSijseajournal
 
Software Fault Tolerance
Software Fault ToleranceSoftware Fault Tolerance
Software Fault ToleranceAnkit Singh
 
Intro softwareeng
Intro softwareengIntro softwareeng
Intro softwareengPINKU29
 
Developing fault tolerance integrity protocol for distributed real time systems
Developing fault tolerance integrity protocol for distributed real time systemsDeveloping fault tolerance integrity protocol for distributed real time systems
Developing fault tolerance integrity protocol for distributed real time systemsDr Amira Bibo
 
An Efficient Approach Towards Mitigating Soft Errors Risks
An Efficient Approach Towards Mitigating Soft Errors RisksAn Efficient Approach Towards Mitigating Soft Errors Risks
An Efficient Approach Towards Mitigating Soft Errors Riskssipij
 
Lecture 20 software testing (2)
Lecture 20   software testing (2)Lecture 20   software testing (2)
Lecture 20 software testing (2)IIUI
 
26 7956 8212-1-rv software (edit)
26 7956 8212-1-rv software (edit)26 7956 8212-1-rv software (edit)
26 7956 8212-1-rv software (edit)IAESIJEECS
 
26 7956 8212-1-rv software (edit)
26 7956 8212-1-rv software (edit)26 7956 8212-1-rv software (edit)
26 7956 8212-1-rv software (edit)IAESIJEECS
 
Intro To Continuous Delivery
Intro To Continuous DeliveryIntro To Continuous Delivery
Intro To Continuous DeliveryBhanu Musunooru
 
Some Commonly Asked Question For Software Testing
Some Commonly Asked Question For Software TestingSome Commonly Asked Question For Software Testing
Some Commonly Asked Question For Software TestingKumari Warsha Goel
 
Association Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure AnalysisAssociation Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure AnalysisEditor IJMTER
 

Similar a Optimal Software Rejuvenation for Complex Systems (20)

Performance testing methodologies
Performance testing methodologiesPerformance testing methodologies
Performance testing methodologies
 
Review Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software FaultReview Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software Fault
 
Ch20
Ch20Ch20
Ch20
 
F017264143
F017264143F017264143
F017264143
 
Restoration and Degeneration of the Applications
Restoration and Degeneration of the ApplicationsRestoration and Degeneration of the Applications
Restoration and Degeneration of the Applications
 
Software maintaince.pptx
Software maintaince.pptxSoftware maintaince.pptx
Software maintaince.pptx
 
Atifalhas
AtifalhasAtifalhas
Atifalhas
 
AN INVESTIGATION OF THE MONITORING ACTIVITY IN SELF ADAPTIVE SYSTEMS
AN INVESTIGATION OF THE MONITORING ACTIVITY IN SELF ADAPTIVE SYSTEMSAN INVESTIGATION OF THE MONITORING ACTIVITY IN SELF ADAPTIVE SYSTEMS
AN INVESTIGATION OF THE MONITORING ACTIVITY IN SELF ADAPTIVE SYSTEMS
 
Testing
TestingTesting
Testing
 
Software Fault Tolerance
Software Fault ToleranceSoftware Fault Tolerance
Software Fault Tolerance
 
Intro softwareeng
Intro softwareengIntro softwareeng
Intro softwareeng
 
Developing fault tolerance integrity protocol for distributed real time systems
Developing fault tolerance integrity protocol for distributed real time systemsDeveloping fault tolerance integrity protocol for distributed real time systems
Developing fault tolerance integrity protocol for distributed real time systems
 
An Efficient Approach Towards Mitigating Soft Errors Risks
An Efficient Approach Towards Mitigating Soft Errors RisksAn Efficient Approach Towards Mitigating Soft Errors Risks
An Efficient Approach Towards Mitigating Soft Errors Risks
 
Lecture 20 software testing (2)
Lecture 20   software testing (2)Lecture 20   software testing (2)
Lecture 20 software testing (2)
 
26 7956 8212-1-rv software (edit)
26 7956 8212-1-rv software (edit)26 7956 8212-1-rv software (edit)
26 7956 8212-1-rv software (edit)
 
26 7956 8212-1-rv software (edit)
26 7956 8212-1-rv software (edit)26 7956 8212-1-rv software (edit)
26 7956 8212-1-rv software (edit)
 
Qms
QmsQms
Qms
 
Intro To Continuous Delivery
Intro To Continuous DeliveryIntro To Continuous Delivery
Intro To Continuous Delivery
 
Some Commonly Asked Question For Software Testing
Some Commonly Asked Question For Software TestingSome Commonly Asked Question For Software Testing
Some Commonly Asked Question For Software Testing
 
Association Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure AnalysisAssociation Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure Analysis
 

Último

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 

Último (20)

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 

Optimal Software Rejuvenation for Complex Systems

  • 1. CHAPTER 1 INTRODUCTION TO SOFTWARE REJUVENATION IN COMPLEX SYSTEM 1.1 Introduction Industry uses high complex system environment, which tends to software aging. Availability is the critical issue for system failure, which causes system degradation, to avoid this issue software rejuvenation technique is used, we use optimal rejuvenation technique for dynamically solving aging problem based on variable workload and timer policy, performance degradation, crash/hang, failure may occur due to data corruption, numerical errors and maximum use of system resource unnecessarily, this leads to software degradation which is known as software aging [1]. If the load increases system may tends to crash that is software aging occurs. That is solved by software rejuvenation technique; software rejuvenation [2] is proactive fault management technique to clear system errors and prevent system from failures in future. This project implements different software rejuvenation techniques depending on variable workload and optimize rejuvenation time, Rejuvenation time is calculated depending on variable workload which is given by system. System periodically checks the workload and update to rejuvenation manager. Figure 1.1 shows the architecture of Rejuvenation manager, it is consists of aging detector which detects the software aging point and optimizer to optimize the timer value for a point of rejuvenation. Aging detector and optimizer has two components namely variable workload and timer policy which perform their defined function respectively. Aging detector obtains the value provided by rejuvenation manager periodically and checks for the need for change in rejuvenation time depending on workload and this is updated for rejuvenation manager, if there is need for change in rejuvenation time then rejuvenation manager allows optimizer to change the time, the function of optimizer is to optimize the time depending the values provided by aging detector based on workloads.
  • 2. Figure 1.1 System architecture used for software rejuvenation in complex system Analysis of performance, dependability of complex systems is done through SPNP (Stochastic Petri Net Package) [2].Weight cardinality arc; guarded function of a complex system is constructed through SPNP. A state can be reached by all other states becomes irreducible in Marko chain [2]. A CTMC (Continuous Time Markov Chain) [3] is ergodic if it is irreducible and if a state is reached by all other state recursively in finite period. Steady state analysis underlying ctmc is done by SPNP [3], and few measures related to steady state are not considered as their values can be obtained by steady state probability. Software rejuvenation process can be done in different methods namely cold and warm .In cold OS reboot process, the system is rebooted immediately at rejuvenation point. Rejuvenation point is a point where memory consumption of system reaches a threshold value or predetermined time. When system consumes high amount of ram the OS must be rebooted, clearing all internal states. Memory consumption may be done by applications or error prone codes which run for long time consuming large amount of ram or OS itself. In OS warm reboot process, before rebooting the kernel state is saved, including all applications running on kernel, their sates are saved .saving the kernel state is done by creating a complete image of kernel. Approaches to Software Rejuvenation
  • 3. Software rejuvenation can be divided broadly into twoapproaches as follows. Time based approach [4][5][6]: In this approach, rejuvenationis performed without any feedback from the system. Rejuvenation in this case, can be based just onelapsed time (periodic rejuvenation) and/or instantaneous cumulative number of jobs on the system. Time and workload approach [4][5][6]: In this approach, rejuvenation is performed based on information on thesystem “health”. The system is monitored continuously (in practice, at small deterministic intervals) anddata is collected on the operating system resource usageand system activity. This data is then analyzedto estimate time to exhaustion of a resource whichmay lead to a component or an entire system degradationcrash. This estimation can be based purely on time or can be based on both time and systemworkload. Time is optimized based on workload applied and it is updated to system rejuvenation time. 1.2 Motivation: • Current systems are reactive if a failure occurs necessary steps will be taken to handle it but they can’t detect such a failure beforehand. Our project aims to detect such failure proactively using Time and workload techniques and take action before a given node crashes. • Our project determines if a node is going to fail based on RAM utilization and then it rejuvenates the failing node. Our project analyze and identifies these failing nodes using both Time and load balancing Rejuvenation techniques. 1.3 State of Art Development Complex system is a form of ubiquitous computing deals with providing everything as a service. Complex system mainly used in business and IT industry it offers heavy outsourcing model computational resource, where service availability, security and quality are essential features. In
  • 4. Complex system High service availability is the most important requirement increasingly being demanded in commercial computer, and communication systems. In recent years many research efforts have been going to find the optimal infrastructure size and configuration that guarantee the desired availability level. Software fault tolerance is often found to be the bottleneck. A failure in software’s is mainly due to certain elusive error conditions which it leads to resource exhaustion. Software systems appear to age as error conditions arise and accumulate with operational time due to certain elusive faults in system software and application software. Software rejuvenation is a proactive fault management technique aimed at cleaning up the internal states in order to prevent the occurrence of severe crash failures in the upcoming years the simplest way to emulate software rejuvenation is to reboot the system or restart the aging application. It is a cost effective technique dealing with software faults that includes protection not only against hard failures and also due to degradation over time of application performance. 1.3.1 Classification of software faults: Faults, in both hardware and software, can be classified according to their phase of creation or occurrence, system boundaries (internal or external), domain (hardware or software). In this section, we limit ourselves to the classification of software faults based on their phase of creation .some studies have suggested that since software is not a physical entity, it is not focusing to transient physical phenomena (as opposed to hardware), hence software faults are stable in nature [1].some other studies organizes software faults as both permanent and transient. Gray [2] categorizes software faults into Bohrbugs and Heisenbugs. Bohrbugs is essentially stable design faults and hence, approximately it is deterministic in nature. They can be recognized easily and weeded out during the testing and debugging phase (or early deployment phase) of the software life cycle. A software system with Bohrbugs is related to a faulty deterministic finite state machine. Heisenbugs, on the other hand, fit into the class of temporary internal faults and are intermittent. They are essentially stable faults whose conditions of creation occur rarely or are not easily recreated. Hence these types of faults result in transient failures i.e., failures which may not occur again if the software is restarted. Heisenbugs are extremely difficult to identify through testing. Hence a piece of software which is developed in the operational phase gets released after
  • 5. its development and testing phase, is more likely to be experienced with failures caused by Heisenbugs than due to Bohrbugs. Most modern studies on failure data have reported that a large percentage of software failures are transient in nature caused by phenomena such as overloads or timing and exception errors. The revise of failure data from Tandem’s fault tolerant computer system indicate that 70% of the failures were transient failures, caused by faults similar to race conditions and timing problems. We designate faults attributed to software aging as aging related faults. Aging related faults fall under Bohrbugs or Heisenbugs depending on whether the failure is deterministic (repeatable) or transient [3]. Foraging-related bugs, environment diversity can be particularly effective if utilized proactively in the form of software rejuvenation. Rejuvenation operation can be triggered either by time based (on deterministic intervals) or by using measurement and analysis of data of the system condition that undergoes software aging problems in various workstation environments. 1.3.2. Basic concepts of software aging and software rejuvenation: Software aging is defined as the state of the software that degrades with time. The primary causes of this degradation are the exhaustion of operating system resources, data corruption, and accumulation of numerical errors, which eventually may lead to performance degradation of the software, crash/hang failure, or both. A typical example of software aging is progressive increase in memory consumption which conclusively causes a memory leak. Since software aging can be observed only in the software execution, it is difficult to find aging related problems until the software is deployed and executed in a specific environment. This figure describes the threads which lead to aging related failure in the system.
  • 6. Figure 1.2 Aging Related failure The accumulation of AR (aging related) errors may tend to AR failure or fault. Aging effects can also be classified into volatile and non-volatile effects. They are considered volatile if they are isolated by re-initialization of the system or process affected, for example via a system reboot. In contrast, non-volatile aging effects still exist after reinitializing of the system/process. Physical memory division and OS resource outflow are examples for volatile aging effects. File system schema and database metadata fragmentation are examples for non- volatile aging effects [4]. The fault tolerance technique which is used to mitigate the aging effects of system is known as software rejuvenation. Software rejuvenation is defined as occasionally stopping the running software, cleaning its internal state or its environment and restarting it. Such a technique known as software rejuvenation was proposed by Huang which counteract the aging phenomenon in a proactive‖ manner by removing the accumulated error conditions and freeing up of operating system resources. Garbage collection, flushing operating system kernel tables, and reinitializing internal data structures are some examples by which the internal state or the environment of the software can be cleaned. There are basically two approaches followed for Software rejuvenation and for finding the optimum rejuvenation schedule: first is by analytic model and measurement based rejuvenation. The analytic modelling approach assumes failure and repair time distributions of a system and obtains optimal rejuvenation Schedule to maximize the availability, or minimize the loss probability or downtime of cost. Measurement-based rejuvenation approach is based on monitoring of resource consumption in a computer system and analysis of that data to determine the point of time when a resource will be completely exhausted, thereby causing the system to hang/crash. Measurement based Software AR Bugs Aging Factors AR Error System Internal Environment AR Failure Activates Propagates
  • 7. Rejuvenation can follow any of the following policies: Purely Time based Software Rejuvenation Policy (PTSRP) or Purely Prediction based Software. Figure 1.3 Rejuvenation Scheduling 1.3.3 Rejuvenation technique In this section, we review the three VMM rejuvenation techniques. When VMM rejuvenation needs to be performed on a host, the hosted VMs also need to be controlled because the execution environments of VMs are cleared by the VMM rejuvenation. Prior to VMM rejuvenation, we can perform VM shutdown (i.e., Cold-VM rejuvenation), VM suspend (i.e., Warm-VM rejuvenation), or VM migration (i.e., Migrate-VM rejuvenation). These approaches are presented in the next three subsections. 1.3.3.1 Cold-VM rejuvenation The easiest way to deal with the hosted VMs before triggering rejuvenation of VMM is to shut down all the hosted VMs regardless of the execution states of the VMs. The VMs are then Software rejuvenation Scheduling Time-Based Inspection-Based Threshold Based Prediction based Mixed Approach Statistical Structural Models and Statistical Machine Learning Online | Offline Online| Offline Online| Offline Online |Offline Online |Offline Online |Offline
  • 8. restarted in clean states after the VMM rejuvenation. This approach is called Cold-VM rejuvenation. All the transactions running on VMs are vanished by the Cold-VM rejuvenation [6]. An advantage of the Cold-VM rejuvenation, however, is that the rejuvenation action cleans all the aging states of the VMs in addition to the aging states of the VMM 1.3.3.2 Warm-VM rejuvenation Instead of shutting down the hosted VMs, the hosted VMs are suspended prior to VMM rejuvenation is triggered and the executions of the VMs are resumed at the completion of the VMM rejuvenation. We call this technique Warm-VM rejuvenation [5]. Since the execution states of the hosted VMs are saved prior to VMM rejuvenation, the transactions running on the VMs are not lost due to the VMM rejuvenation. However, Warm-VM rejuvenation retains the aging states of VMs by VM suspend. The aging states in the hosted VMs are not cleared by VMM rejuvenation and hence we need to rely on rejuvenation for VM to clear the aging states of VMs. 1.3.3.3 Migrate-VM rejuvenation Live VM migration is a technique to move a running VM to another host incur a short service interruption and is supported in most modern VMM implementations such as Xen and VMware. Although a shared storage system is required to store a VM image, the downtime overhead caused by a VM migration is less. Using live VM migration, hosted VMs are moved to another host prior to VMM rejuvenation and returned back to the original hosting server after the completion of the rejuvenation of the VMM, by a reverse live VM migration. We call this combined method as Migrate-VM rejuvenation [6]. The VM continues the execution even while the VMM on the original host is being rejuvenated. However, the aging states in the hosted VMs are not cleared by the VMM rejuvenation as in the case of Warm-VM rejuvenation. Live VM migration works only when the migration target server is running and it has a capacity to accept the migrated VM. Comparison of different software rejuvenation policy is described in table 1.1 Table 1.1 Comparison of different software rejuvenation policy Policy Aging Condition Analysis Tool Threshold value Availability Methodology Or
  • 9. Model Constrained Element Based Software Rejuvenation Policy In Embedded Environment (CESRP) To detect aging CESRP uses CPU frequency - - -- - - - -- - W (Shapiro-walk Detection) = 0.9781 Pvalue = 0.8453 Probability density µ=8450.16 Stack result σ=1830.731 Constrained path and Constrained element with mathematical model Gray’s classification of software faults Hear aging is detected by time base depending on result of an operating system resources SNMP (Simple Network Management Protocol) based on distributed resource monitoring tool • Mean time to recover from a failure = 4 hours • Mean time to rejuvenate the system = 1 hour • Mean time to failure = 41.38 days • Cost of failure = $5000/hour • Cost of rejuvenation = $500/hour σ∗ (Optimal availability) = 36.12 days σ (Down time) = 5.60 days Semi-markov reward model based on workload and resources POMDP (Partially Observable markov decision process) Aging detected based on the degradation level of system - - -- - - - -- - POMDP K = 1 0.9951 POMDP K = 4 0.9932 POMDP K = 9 0.9901 CMTC (Continues Time Markov Chain) model
  • 10. POMDP K = 99 0.9901 Software rejuvenation based on automated self-healing techniques Aging can be detected based on 1.Online transaction processing (OLTP) servers, 2.Middleware applications and Web/application- servers SAN (Stochastic Active Network) - - -- - - - -- - Basic steady- state availability = 0.824673 Tolerance availability = 0.983678 This policy consists of six methodology • System under test •Fault model. • Fault- remediatio n relationshi p. • Micro- measurem ents. • Macro- measurem ents. • Workload and metric collectors. Component- Dependency based Micro- Rejuvenation Scheduling Policy An aging can detected based on utilization of system resources, such as memory, SAN Model - - -- - - - -- - micro- rejuvenation scheduling 1.4 Problem Statement: • Performance degradation in the complex system running for a long time • They are susceptible to crash because of data corruption, numerical error accumulation and availability of OS resources. • Thus, leading to downtime and non-optimal performance.
  • 11. • Based on vary in workload the rejuvenation time is optimized to reduce the down time and increase the availability of the system. 1.5 Objectives of the project: • The main objective of this project is to reduce software failure rates, avoid downtime and to improve the system availability using Software rejuvenation policy based on time and load balancing scheme using ITL algorithm. • Availability of the system for various rejuvenation techniques is analyzed. • Analysis of different rejuvenation technique is done, based on values obtained from SPNP 1.6 Scope of the work: 1.6.1 Limitations of the project: • Hardware compatibility is required. • Same hardware configurations are required on end systems. • Worked on open source tools and packages. 1.6.2 Constraints of the project: • Rejuvenation time and memory peak value is set based on the machine learning studies. • Hardware virtualization must be supported. • Systems must support NFS( Network File Shared). 1.7 Methodology : • Implementation of a proactive based appraoch for software rejuvenation using Time and load balancing schema based techniques. • SRN modelled graphs were used for analysis of algorithm on all modules. • Physical Memory utilization is considered for implementing the Time and Workload based approaches.
  • 12. • Designed ITL algorithm makes use of timer and variable workload policy to present the time- based rejuvenation for performing dynamic adaptation of the rejuvenation timer based on the workload conditions. • ITL algorithm is used to optimize rejuvenation time defined by user when workload is variable. • Availabilty of the system of different modules is derived based on different parameters obtained from SPNP. • Live migration of virtual machine is done using KVM/QEMU. • NFS is configured on two servers to migrate the VM. Figure 1.4 Optimization of rejuvenation time for variable workload Figure 1.4 describes ITL algorithm to optimize the rejuvenation time (VT) with respect to system workload (WL). Based on variation of workload WL+1 the rejuvenation time has to be optimized to VT+1. If the system workload is back to the normal condition WL+2, the optimizer has to optimize the rejuvenation time to VT+2.
  • 13. 1.8 Organization of the Report: This report contains 8 chapters. • The first chapter deals with the Introduction to the project. It covers the purpose, motivation and scope for the project. It also talks about the methodology adopted and the literature survey undertaken. • The second chapter primarily deals theory and concepts of software rejuvenation in complex system. • The third chapter primarily deals with the software and hardware requirements specifications required for the project. It is the software requirements document. • The fourth chapter explains the design of the system proposed in the project. It gives a detailed overview of the components used in the project. • The fifth chapter describes the implementation of the project. It discusses the difficulties encountered while coding the project and the coding standards used. • The sixth chapter focuses on testing the application so that it is a robust product. Various tests are conducted on the all modules. • The seventh chapter presents the results of the classification and comparison. Analysis of all modules is done here. • The eighth chapter gives the conclusion of the project. It deals with the limitations of the product and the future enhancements possible in the project.
  • 14. CHAPTER 2 THEORY AND CONCEPTS OF SOFTWARE REJUVENATION IN COMPLEX SYSTEM 2.1 Introduction Software rejuvenation has become a new horizon for increasing the system reliability and availability in a long run. With time, the system outages tend to increase due to the aging of software which may be caused due to numerous factors like memory leaks, unreleased locks, file descriptor leaking and so on. The rejuvenation of the software based on time factor tends to periodically rollback a continuously running application to prevent failures in the future. The time factor is set a particular value after which the software is restarted. Thus the better way to avoid software failure and to increase the availability and reliability of the system is to find the failure probable state and rejuvenate the software prior to the failed state. Project investigates about time based rejuvenation policies in maintaining high reliability of software systems. Software rejuvenation is a process or act of gracefully terminating a running application and restarting it. The main motive behind the rejuvenation process is to prevent any unexpected errors which might be caused due to aging related issues of the software. So the idea of the software rejuvenation is to suspend the application and restart it before it suffers any error. The rejuvenation strategy is primarily intended for servers where the applications are intended to run incessantly for days without any failure. Software aging involves the gradual degradation of application performance over time that may lead to untimely cessation of the program. The main objective of the process is to maintain higher system reliability and availability by cleaning internal system states prior to the failure state of the application. 2.2 Software Rejuvenation Techniques Review Software rejuvenation technique takes in account different types of approaches. Broadly these are classified as: Standard rejuvenation, Delayed rejuvenation and Mixed rejuvenation.
  • 15. 2.2.1 Standard Rejuvenation In Standard rejuvenation, rejuvenation occurs once triggering interval is reached. This rejuvenation policy does not take workload into consideration i.e., there is no concern of workload. This strategy ignores both i.e. Peak load or off peak load and the rejuvenation happens on triggered time. 2.2.2 Delayed Rejuvenation In delayed rejuvenation, on peak load nodes are scheduled for rejuvenation if the rejuvenation time is reached during peak period, the actual rejuvenation is started as soon as the next off peak period starts. 2.2.3 Mixed Rejuvenation The mixed rejuvenation policy is the combination of standard rejuvenation strategy and delayed rejuvenation strategy. If the rejuvenation is timed early in peak period, rejuvenation of the application is done immediately or else the rejuvenation is delayed till the next off period starts. 2.2.4 Erlang Approximation Based on workload, i.e., peak load and off peak load, different time policy methods are established to solve the quest for finding the interval need for scheduling. In standard rejuvenation, neglecting peak load or off peak load rejuvenation occurs on triggered time. In delayed rejuvenation as defined above the delayed time is obtained by Erlang distribution. DSPN becomes a markovian stochastic Petri net and the solution techniques for markov chains can be applied. The deterministic switching time between peak and off-peak periods is kept as it is, hence this model is a DSPN (Deterministic and stochastic petri nets). The rejuvenation is triggered at every time units, and is modeled by the deterministic transition, with constant firing time. When deterministic transition is fired, and if the immediate transition is enabled at that time, the token will be moved to another place, indicating a beginning of rejuvenation activity. Standard rejuvenation, timer is always enabled, while for delayed rejuvenation, timer is disabled during peak period, and for mixed rejuvenation, timer is enabled for the initial time duration of certain length and disabled thereafter in peak period. After the
  • 16. rejuvenation finishes, reset will fire to return a token back to its place, hence beginning the next rejuvenation cycle. In order to make the model solvable by SPNP approximate the deterministic transition by an r- stage Erlang distribution. This is achieved by storing r tokens in other place and replacing deterministic timer by an exponentially distributed timed transition with firing rate r/timer. At the same time, they change the multiplicities to r for the output arc of reset timer and the input arc of timer policy. 2.3 Stochastic Reward Nets Model for Time based Software Rejuvenation in Virtualized Environment Here we are mainly focused on the unplanned software outages due to software aging problem. We present a comprehensive availability model for both VM clustering software rejuvenation model and VM migration based software rejuvenation model. In this model captures software aging states of VM and VMM as well as their failures caused by aging. Using analytical modeling as a stochastic reward nets (SRN). In this model we describe our proposal to offer high availability mechanism using time based software rejuvenation methodology. First we present the ways of using virtualization to improve software rejuvenation for addressing the software aging issue. In the proposed system, virtualization technology and software rejuvenation are used to provide the availability of the services. Clustering supports two or more servers running duplicate VMs. Failover technologies also allow a failed VM to load from a storage snapshot and start up on another server. To counteract the software and hardware failure, the rejuvenation schedules for VM and VMM need to determine in proper way for the VM availability, since VMM rejuvenation effects VMs running on the VMM. The following two scenarios are studied in this paper. 2.3.1 VM Clustering Software Rejuvenation (2vms1pm)
  • 17. Physical machine hosts the virtual machines. One monitoring VM and other operational VMs on the top of the virtualization layer (VMM) are created. The main application server will be running on one VM and the remaining VM will be used for standby server. Some software modules that will be responsible for the detection of software aging are installed in the monitoring VM. The monitoring VM will trigger a rejuvenation operation. If the active VM is about to be rejuvenated, standby VM will be started and then all the new requests and sessions are switched from the active VM to standby VM. So the physical machine itself is a SPOF (single point of failure). 2.3.2 VM Migration Based Software Rejuvenation (2vms2pms) In this scenario Active-standby virtualized clustering architecture is employed. A high available cluster is built between two or more virtual machines, each of them running on different physical machines (2 PMs). Two PM’s consists of Active physical server and standby physical server. Both physical servers can access shared storage. A heartbeat keep-alive system is used to monitor the interaction of VMs and the physical servers. At active physical server, VMs are created as monitoring VM, active VM and standby VM as well as standby physical server. Both VM and VMM time-based rejuvenation mechanism is considered in this scenario. Time based rejuvenation policy for VM is same as active-standby VMs hosted on 1PM. Live VM migration enables a running VM on a host server to move onto the other host server with very small interruption of the execution. When VMM need to be rejuvenation, the hosted VMs can move onto other physical server. It can return back to the original host after the completion of the VMM rejuvenation by live VM migration again. In the event of an active physical server outage, the virtualized recovery server at standby physical server can be activated to take over the running of the workload immediately using live migration. The down time of a VM caused by live VM migration is very small and the VM continues the execution even while the original host is down. CHAPTER 3 SOFTWARE REQUIREMENT SPECIFICATION OF SOFTWARE REJUVENATION IN COMPLEX SYSTEM
  • 18. 3.1 Project Description Software Rejuvenation in Complex System has six different modules namely OS Cold reboot, OS Warm reboot, VM Cold reboot, VM Warm reboot, VMM reboot, VM migration. Each module consist of unique working method, which is explained below 3.2 Module Description There are mainly six modules, they are: OS cold reboot, OS warm reboot, VM cold reboot, VM warm reboot, VM migration, VMM reboot. 3.2.1 Module for OS Cold rejuvenation: In Cold OS reboot process, the system is rebooted immediately at rejuvenation point. Rejuvenation point is a point where memory consumption of system reaches a threshold value or predetermined time. When system consumes high amount of ram the OS must be rebooted, clearing all internal states. Memory consumption may be done by applications or error prone codes which run for long time consuming large amount of RAM or OS itself In this process the memory left is compared to our pre-determined threshold value, if the memory left is greater than the threshold value, the system is allowed to run in normal state i.e. Systems have not reached the threshold point of consumption. If it is lesser i.e. the system have consumed memory greater than the threshold point, then OS is restarted immediately The amount of free memory left is extracted and compared with predetermined threshold free memory value, on results of comparison obtained; further process is taken care by ITL algorithm. 3.2.2 Module for OS warm rejuvenation: In OS warm reboot process, before rebooting the kernel state is saved, including all applications running on kernel, their sates are saved .saving the kernel state is done by creating a complete image of kernel. OS reboot process is divided in two stages 1) Suspend, 2) Resume. In Suspend stage kernel is called to create a snapshot of current system state later snapshot data is written to disk, finally system is rebooted. In Resume stage, when the system is turned on, grub loader runs from initrd
  • 19. before mounting any partitions, later all the data of snapshot is read from disk and loaded to kernel, kernel restores the image and thus system runs from same state where it was suspended. 3.2.3 Module for VM cold reboot: In VM cold reboot process [9], the VM is rebooted immediately at rejuvenation point, hypervisor is untouched. Rejuvenation point is a point where memory consumption of system reaches a threshold value or predetermined time. When VM consumes high amount of ram the VM must be rebooted, clearing all internal states. Memory consumption may be done by applications or error prone codes which run for long time consuming large amount of RAM or OS itself ITL algorithm compares the memory left to our pre-determined threshold value, if the memory left is greater than the threshold value; the system is allowed to run in normal state i.e. System have not reached the threshold point of consumption. If it is lesser i.e. the system have consumed memory greater than the threshold point, then rejuvenation time is optimized and updated to predetermined time, when rejuvenation time is equal to system time then VM is restarted immediately without saving any state of running VM. 3.2.4 Module for VM warm reboot: In VM warm reboot process, before rebooting the kernel state of particular failing VM is saved, including all applications running on kernel, their sates are saved .saving the kernel state is done by creating a complete image of kernel. VM Warm reboot process is divided in two stages 1) Suspend, 2) Resume. In Suspend stage kernel is called to create a snapshot of current system state later snapshot data is written to disk, finally system is rebooted. In Resume stage, when the system is turned on, grub loader runs from initrd before mounting any partitions, later all the data of snapshot is read from disk and loaded to kernel, kernel restores the image and thus system runs from same state where it was suspended. Here this module provides decrease in request failures and high availability to the VM. ITL algorithm compares the memory left to our pre-determined threshold value, if the memory left is greater than the threshold value; the system is allowed to run in normal state i.e. System have not reached the threshold point of consumption. If it is lesser i.e. the system have consumed memory greater than the threshold point, then rejuvenation time is optimized and updated to
  • 20. predetermined time, when rejuvenation time is equal to system time then VM is restarted immediately saving state of running VM. 3.2.5 Module for VMM reboot In VMM cold reboot process, the VMM is rebooted immediately at rejuvenation point, all the VM’s running on VMM are shut down before rebooting VMM. Rejuvenation point is a point where memory consumption of system reaches a threshold value or predetermined time. When VMM consumes high amount of RAM the VMM must be rebooted, clearing all internal states. Memory consumption may be done by applications or error prone codes which run for long time consuming large amount of RAM. In this process ITL algorithm compares the memory left to our pre-determined threshold value, if the memory left is greater than the threshold value, the system is allowed to run in normal state i.e. System have not reached the threshold point of consumption. If it is lesser i.e. the system have consumed memory greater than the threshold point, then rejuvenation time is optimized and updated to predetermined time, when rejuvenation time is equal to system time then VM is restarted immediately without saving any state of running VM, If VMM memory consumption reaches its peak point i.e. VMM tending to crash in soon time then VMM is restarted even if all VM is running in normal state and no state, data is saved but user is given period of one minute user can cancel the rebooting process or shutdown the VMM completely. 3.2.6 Module for VM Migration [10] [11] [12] In this module, VM from the failing server is transferred to preconfigured secondary server before the VM tending to fail, the complete data and application running on the main server is transferred to the secondary with no interruption for application running. When the complete VM is transferred to another server and loaded, all the applications which were running in main server will be in same state even after transferred, with no loss of data of applications running. As this is all done by configuring NFS for both servers and configuring virtual manager and virish packages initially, applying this concept to our project, when the server get huge load of request or high memory is consumed which may lead to hang/crash or failure of the system, when user set the rejuvenation time and threshold memory value, rejuvenation manager checks for aging problem in system and if aging problem is detected then the rejuvenation time predetermined by user is optimized by ITL algorithm and system is rejuvenated at rejuvenation
  • 21. time, here for rejuvenation we use migration technique to migrate the VM running and reboot the server, hence we provide high availability and decrease in request failure. 3.3 Software requirements: Table 3.1 Software requirements
  • 22. 3.4 Hardware Requirements: Table 3.2 Hardware Requirements 3.5 Performance Requirements: • Availability The system shall achieve 100 percent availability at all time. • Portability Minimum Requirements OS Cent OS Ubuntu OS Other KVM/QEMU must be installed on both the servers. NFS must be configured on both the system to migrate the VM Note: KVM is a hypervisor or Virtual Machine Monitor, NFS (Network File System) is distributed file system protocol. Language C Minimum Requirements Processor Intel Pentium or better Memory 4 GB RAM Hard Disk 100 GB of hard disk space required. Display 1024x 768 or higher-resolution display with 16 bits colors
  • 23. The system should be implemented by the java so it can move easily from one system another system because it is purely platform independent. • Scalability The system shall uses in multiple approaches. • Maintainability The sys00tem should be optimize for supportability, or ease of maintenance as for as possible. This may be achieved through the use documentation of coding standard, naming conventions, class libraries and abstraction. 3.6 Functional requirements: As per the functional requirement specifications, the project shall provide following facilities • The system collects the current status of the workload based on the RAM utilized by the running application. • Check the aging factor which degrades the availability to application. If any aging factor detected then it will notify. • The system collects the status of the system periodically. • This system keeps track of the system time and it is compared with fixed rejuvenation schedule. If the tracking time is equal to fixed rejuvenation schedule then the system rejuvenated. • This system stores the current status of the process; it is useful to again resume the processor after system rejuvenation takes place. 3.7 Project Effort Estimation: Assumptions: Average Labor Cost : $680/month Average Line of Code (LOC) : 450LOC/month Average cost for a line of code : $1.5/LOC (680 / 450)
  • 24. Modules Details:  The Project contain 6 model each model contain around 490 loc/module in which implementation consists of 320 loc/module and analysis consists of 170 loc/module.  Total Project Size = 490 * 6 = 2940 loc Cost Estimation:  For one module, cost = 490 * 1.5 = $ 735  Total cost of Project = 2940 * 1.5 = $ 4410 Effort Estimation:  Effort = Total Project Cost Average people Cost per month = 4410 / 680 = 6.4852 ≈ 7 Persons/month  7 Persons are required to complete this project in one month duration. 3.8 Project Scheduled:
  • 25. Table 3.3 Required Schedules for each Task Figure 3.1 Gantt chart of Project Schedule
  • 26. CHAPTER 4 HIGH LEVEL DESIGN OF SOFTWARE REJUVENATION IN COMPLEX SYSTEM A software product is a complex entity. Its development usually follows what is known as Software Development Life Cycle (SDLC). The second stage in the SDLC is the Design stage. The objective of the design stage is to produce the overall design of the software. The design stage involves two sub-stages namely: • High-Level Design • Detailed-Level Design In the High-Level Design, the proposed functional and non-functional requirements of the software are studied. Overall solution architecture of the solution is developed which can handle those needs. 4.1 Development Methods: The development method used in this software design is the modular/functional development method. In this, the system is broken down into different modules, with a certain amount of
  • 27. dependency among them. The input-output data that flows from one-module to another will show the dependency. Data flow diagrams have been used in the modular design of the system. 4.2 Data Flow Diagrams: Data-flow models are an intuitive way showing how data is processed by a system. At the analysis level, they should be used to model the way in which data is processed in the existing system. The notation used in these models represents functional processing, data stores and data movements between functions. Dataflow models are used to show how data flows through a sequence of processing steps. The data is transformed at each step before moving on to the next stage. These processing steps or transformation are program functions where dataflow diagrams are used to document a software design 4.3 Data Flow Diagram: 4.3.1 Data Flow Diagram For rejuvenation Manager: Level 0 Figure 4.1 DFD: Level 0: module for Rejuvenation system Rejuvenation process 1.0 REJUVENATION MANAGER System
  • 28. In these figure 4.1 Level 0 modules for rejuvenation describes about main rejuvenation process with variable time and workload policy implemented. The different module is selected initially here and later threshold time and threshold memory is set. 4.3.2 Data Flow Diagram for FTR and FTM: Level 1 Figure 4.2 DFD: Level 1 module for rejuvenation manager In Figure 4.2, the Level1 data flow diagram describes about the working of rejuvenation manager, rejuvenation manager has two modules namely aging detector and optimizer. Aging detector detects the aging factor and invokes optimizer to optimize the rejuvenation time. If aging is not detected then system is rejuvenated at rejuvenation time System 1.1 1.2 Optimizer Aging detector User set Threshold values Rejuvenation Process
  • 29. 4.3.3 Data Flow Diagram: Level 2 Figure 4.3 Data Flow Diagram: Level 2 module for aging detector System 1.1.1 Call Meminfo () 1.1.2 Checking the aging factor Rejuvenation process System Optimizing FTR value Compare FTR with STM Rejuvenati on Process 1.2. 4 1.2. 1 1.2. 2 1.2. 3 Calculate memory factor Analyze Current FTR
  • 30. Figure 4.4 Data Flow Diagram: Level 2 module for time optimizer Aging detector detects the free memory left by calling meminfo( )and againg result is given to rejuvenation manager. Rejuvenation manager compares the free memory left to threshold value given by user, and then it calls the optimizer to optimize the rejuvenation time if comparison results are positive. Optimizer fetches the FTR and STM (System Time) and checks for the free memory left. Based on the threshold value, time is optimized, either increased or decreased. 4.4 Sequence Diagram A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that shows how processes operate with one another and in what order. It is a construct of a Message Sequence Chart. Sequence diagrams are sometimes called Event-trace diagrams, event scenarios, and timing diagrams A sequence diagram shows, as parallel vertical lines ("lifelines"), different processes or objects that live simultaneously, and, as horizontal arrows, the messages exchanged between them, in the order in which they occur. This allows the specification of simple runtime scenarios in a graphical manner. In fig 4.5, clearly depicts the policy used in this project i.e. variable time and workload policy. Rejuvenation manager request for status of workload applied on the system. System ping the rejuvenation manager with workload applied on it, then the rejuvenation manager calls aging detector [13] to compare with predetermined threshold value if any variations observed then this result is given back to rejuvenation manager, later optimizer is invoked to optimize the rejuvenation time, and system is allowed to rejuvenate to its optimized time. If no variation is observed then system is allowed to rejuvenate at predetermined rejuvenation time.
  • 31. Figure 4.5 Sequence Diagram for rejuvenation manager 4.5 Detailed Design 4.5.1 Detailed System Design The main aim of the project is to build a simulator used to simulate the Time and Prediction based rejuvenation approaches. In this section, the individual modules that comprise the building blocks of the system are identified and have presented a complete design for them. The details of the design process for each module contains of the following elements: • The purpose of the module • A description of its functionality • A description of the types and number of inputs it accepts • A description of the types and number of outputs it generates 4.5.2 Module 1: OS cold reboot
  • 32. This module is about OS cold reboot, in cold reboot process the rejuvenation time is entered by user and this time is compared with system time, if there is any variation in workload compared to threshold value given by user then time is optimized and system is rejuvenated at optimized time without Input The input for the module is rejuvenation time and threshold value of memory. Output The output for the module is to rejuvenate at rejuvenation point Figure 4.6 Functioning of OS cold reboot Yes No ART Compar e Mem _cC Declare and retrieve time Reboot STOP
  • 33. The functioning of the cold reboot is described in the above flow diagram. The figure 4.6 shows the process of how OS cold reboot process works, initially user need to set rejuvenation time and threshold memory value and next comparison of system time with rejuvenation time given by user, if time is equal then system is rejuvenated immediately. If time is not equal then memory usage is compared with threshold memory value in block mem_c, if result is negative then system is rejuvenated if result is positive then time is optimized and updated to rejuvenation time. 4.5.3 Module 2: Module for OS warm reboot process This module is about OS warm reboot Input The input for the module is to set predetermined rejuvenation time and threshold memory value. Output The output of the module is to save the state of the kernel as image and save it on hard disk and rejuvenate at rejuvenation time later system must start with from previous reboot state. The functioning of OS warm reboot is described in the following flow diagram. Yes No Yes No Start Save Rejuvenation Stop Optimize Set FTR & FTM FTR che ck Check FFM
  • 34. Figure 4.7 Module of OS warm reboot The figure 4.7 shows how OS warm reboot works. First user have to set the predetermined rejuvenation time and threshold value of memory, next comparison of system time with rejuvenation time given by user, if time is equal then kernel state is saved and stored in hard disk and system is rejuvenated. If time is not equal then memory usage is compared with threshold memory value in block check FFM (Fixed Free Memory) if result is negative then system time is checked with rejuvenation time. If result is positive then time is optimized and updated to rejuvenation time. System is rejuvenated at the rejuvenation time. 4.5.4 Module 3: VM cold reboot. This module describe about VM cold reboot.
  • 35. Input The input for the module is to set predetermined rejuvenation time and threshold memory value. Output Output for the module is to rejuvenate the VM at the rejuvenation time The functioning of the VM cold reboot module is described in the following flow diagram. Mem _C NoYes START Compa re Declare and retrieve time Reboot STOP Yes No
  • 36. Figure 4.8 Module for VM cold reboot The figure 4.8 shows module for VM cold reboot is shown, initially user need to set rejuvenation time and threshold memory value. Next, compare system time with rejuvenation time given by user, if time is equal then VM is rejuvenated immediately. If time is not equal then memory usage is compared with threshold memory value in block mem_c, if result is positive then system is rejuvenated if result is negative then time is optimized and updated to rejuvenation time. 4.5.5 Module 4: module for VM warm reboot Input The input for the module is to set predetermined rejuvenation time and threshold memory value. Output The output of the module is to save the state of the fault VM’s kernel as image and save it on hard disk and rejuvenate at rejuvenation time later VM must start from previous reboot state. The functioning of the VM warm reboot module is described in the following flow diagram.
  • 37. Figure 4.9 VM warm reboot Module The figure 4.9 shows VM warm reboot process , First user have to set the predetermined rejuvenation time and threshold value of memory, next comparison of system time with Yes No Optimize START Save Rejuvenation STOP Set FTR and FTM FTM Che ck FTR Che ck FTR Check FTM Check FTM Resume Yes No
  • 38. rejuvenation time given by user, if time is equal then kernel state is saved and stored in hard disk and system is rejuvenated. If time is not equal then memory usage is compared with threshold memory value in block named check FTM (Fixed Threshold Memory), if result is negative then system time is checked with rejuvenation time. If result is positive then memory usage is compared with peak memory value in block named check FTM, if result is positive then kernel state is saved and stored in hard disk and system is rejuvenated, if result is positive then time is optimized and updated to rejuvenation time. System is rejuvenated at the rejuvenation time 4.5.6 Module 5: module for VM migration This module describe about VM migration. Input The input for the module is to set predetermined rejuvenation time and threshold memory value. Output Output for the module is to migrate the VM from server which is tending to fail to the another server at the rejuvenation time The functioning of the VM migrate module is described in the following flow diagram.
  • 39. Figure 4.10 VM migration module NoYes Yes No Optimize Migrate STOP Set FTR and FTMSet FTR and FTM Che ck FTR Che ck FTR Check FTM Check FTM STARTSS TART No Yes
  • 40. The figure 4.10 shows flow chart of VM migration clearly depicts it working, initially admin need to set the rejuvenation time and threshold memory value where the VM must be migrated, here whatever the application running and dynamic data entered in VM will be migrated successfully to the secondary server configured, so this module will provide most availability to the server. Once when rejuvenation time is set and if heavy workload applied to the server in mean time then the rejuvenation time is optimized so the server will be protected from hang/crash failure. When rejuvenation time is reached the complete VM will be migrated to another server configured, as data and application running are non-corrupted this module provide no request failure and high availability, which is in great need to current corporate world. 4.5.7 Module 6: Module for VMM reboot This module describe about VMM reboot. Input The input for the module is to set predetermined rejuvenation time and threshold memory value. Output Output for the module is to rejuvenate the VMM at the rejuvenation time The functioning of the VMM reboot module is described in the following flow diagram. Mem _C NoYes START Compa re Set FTR and FTM Reboot STOP Yes No
  • 41. Figure 4.11 VMM reboot model The figure 4.11 shows the process of VMM reboot, initially user need to set rejuvenation time and threshold memory value and next comparison of system time with rejuvenation time given by user, if time is equal then system is rejuvenated immediately. If not then depending on the workload system will optimize the rejuvenation time, at an optimized time VMM reboot takes place. In this module before VMM reboot, all the VM’s running are shut down. CHAPTER 5 IMPLEMENTATION OF SOFTWARE REJUVENATION IN COMPLEX SYSTEM The implementation phase of any project development is the most important phase as it yields the final solution, which solves the problem at hand. The implementation phase involves the actual materialization of the ideas, which are expressed in the analysis document and developed in the design phase.
  • 42. Project has six modules OS Cold reboot, OS Warm reboot, VM Cold reboot, VM Warm reboot, VM Migration and VMM reboot, based on Time and Workload rejuvenation policies and also analysis of all the modules are done using SPNP (Stochastic Petri Nets Package). 5.1 Platform Selection: 5.1.1 KVM/QEMU: KVM (Kernel-based Virtual Machine) is a full virtualization solution for Linux on x86 hardware containing virtualization extensions (Intel VT or AMD-V). It consists of a loadable kernel module, kvm.ko that provides the core virtualization infrastructure and a processor specific module, kvm-intel.ko or kvm-amd.ko. KVM also requires a modified QEMU although work is underway to get the required changes upstream. Using KVM, one can run multiple virtual machines running unmodified Linux or Windows images. Each virtual machine has private virtualized hardware: a network card, disk, graphics adapter, etc. The kernel component of KVM is included in mainline Linux, as of 2.6.20.KVM is open source software. A wide variety of guest operating systems work with KVM, including many flavours of Linux, BSD, Solaris, Windows, Haiku,ReactOS, Plan 9, and AROS Research Operating System. In addition Android 2.2, GNU/Hurd (Debian K16), Minix 3.1.2a, Solaris 10 U3, Darwin 8.0.1 and more OSs and some newer versions of these with limitations are known to work. A modified version of QEMU can use KVM to run Mac OS X. 5.1.2 SPNP (Stochastic Petri Net Package) [12][13] [14]: This package was developed by Ciardo et.al. The model type used for input is a SRN (Stochastic Reward Net). SRNs incorporate several structural extensions to GSPNs such as marking dependencies (marking dependent arc cardinalities, guards, etc.) and allow reward rates to be associated with each marking. The reward function can be marking dependent as well. They are specified using CSPL (C based SRN Language) which is an extension of the C programing language with additional constructs for describing the SRN models. SRN specifications are automatically converted into a Markov reward model which is then solved to
  • 43. compute a variety of transient, steady-state, cumulative, and sensitivity measures. For SRNs with absorbing markings, mean time to absorption and expected accumulated reward until absorption can be computed. The interface increases the power of SPNP (Stochastic Petri Net Package) [15] by providing a means of rapidly developing stochastic reward nets (SRNs); the model type used for input. Input to SPNP is specified using CSPL (C based SPN Language), but the interface removes this burden from the user by providing an interface for graphical representation of the model. The first interface was implemented with Tcl/Tk. Then JAVA was used develop the new version, which makes the look and feel of the interface. 5.1.3 CentOS (OS selection) The CentOS Linux distribution is a stable, predictable, manageable and reproducible platform derived from the sources of Red Hat Enterprise Linux (RHEL). The process delivered has a clear governance model, increased transparency and access. Since March 2004, CentOS Linux has been a community-supported distribution derived from sources freely provided to the public by Red Hat. As such, CentOS Linux aims to be functionally compatible with RHEL. CentOS change packages to remove upstream vendor branding and artwork. CentOS Linux is no-cost and free to redistribute. CentOS Linux is developed by a small but growing team of core developers. In turn the core developers are supported by an active user community including system administrators, network administrators, managers, core Linux contributors, and Linux enthusiasts from around the world. We adopt this OS because it is highly compatible and stable, it is very easy to install KVM and configure it. Moreover configuring NFS is easy for beginners and ports can be resolved properly. The forums of this OS had all the solutions to problems we have faced in other OS like Ubuntu, fedora. Moreover it is open source and codes are available online. 5.2 Programming Language Used (Language Selection): C is a general-purpose programming language initially developed by Dennis Ritchie. C is an imperative (procedural) language. It was designed to be compiled using a relatively straightforward compiler, to provide low-level access to memory, to provide language constructs that map efficiently to machine instructions, and to require minimal run-time support. C was
  • 44. therefore useful for many applications that had formerly been coded inassembly language, such as in system programming. Despite its low-level capabilities, the language was designed to encourage cross- platform programming. A standards-compliant and portably written C program can be compiled for a very wide variety of computer platforms and operating systems with few changes to its source code. The language has become available on a very wide range of platforms, from embedded microcontrollers to supercomputers. Table 5.1 Methods used in code Methods used in code Description void meminfo(void) Used to check system free memory status. FILE_TO_BUF(meminfo.file, memif_id) Used to store intermediate results of memory status stroul( ) Used to convert string to unsigned long integer time( ) Used to get current system time. This function return time_t type variable. memcopy( ) Used to convert time_t struct variable totm_d struct variable. loacaltime( ) Used to fetch system local time. Sizeof To get object size System( ) This function is used to invoke system command fprintf( ) This function used to write to file. fopen( ) This function is used to create a file 5.3 Installing and configuring KVM on cent OS
  • 45. 5.3.1 Check Hardware Virtualization support KVM requires hardware virtualization support such as Intel VT or AMD's AMD-V, which are instruction set extensions for hardware-assisted virtualization. Check if hardware virtualization support is available on CentOS host machine: $ egrep -i 'vmx|svm' --color=always /proc/cpuinfo If CPU flags contain "vmx" or "svm", it means hardware virtualization support is available. 5.3.2 Configure FQDN for local host Configure FQDN (Fully Qualified Domain Name) for local host. Otherwise, you may get warnings while launching libvirtd daemon such as "getaddrinfo failed for 'myhost': Name or service not known". To configure FQDN, edit the following configuration file: $ sudo -e /etc/sysconfig/network HOSTNAME=xxx.yyy 5.3.2.1 Disable SELinux Before installing KVM, be aware that there are several SELinux Booleans that can affect the behavior of KVM and libvirt. Here we set Selinux to 0 "Permissive" for demonstration purpose. If you do not wish to change SELinux mode. To disable SELinux on CentOS: $sudo -e /etc/selinux/config SELINUX=permissive 5.3.2.2 Reboot the machine for the change to take effect.
  • 46. 5.4 Install KVM, QEMU and user-space tools To install KVM, QEMU and user-space tools use the following steps: Step1: Install KVM and virtinst (a tool to create VMs) as follows: $sudo yum install kvm libvirt python-virtinst qemu-kvm Step2: Start libvirtd daemon, and set it to auto-start: $sudo service libvirtd start $sudo chkconfig libvirtd on Step3: Check if KVM has successfully been installed. You should see no error as follows. $ sudo virsh -c qemu:///system list Id Name State ---------------------------------------------------- Step4: Configure Linux Bridge for VM Networking Installing KVM alone does not allow VMs to communicate with each other or access external networks. You need to configure VM networking separately. Here, we set up "bridged networking" via Linux Bridge.  Install a package needed to create and manage bridge devices: $sudo yum install bridge-utils  Disable Network Manager Service if it's enabled, and switch to default net manager as follows. $sudo service NetworkManager stop $sudo chkconfig NetworkManager off $sudo chkconfig network on $sudo service network start
  • 47. To configure a new bridge, you have to pick an active network interface (e.g., eth0), and enslave it to the bridge. Depending on whether the network interface is assigned an IP address via DHCP or statically, there are two different ways to configure a new bridge.  To configure bridge br0 via DHCP: $sudo -e /etc/sysconfig/network-scripts/ifcfg-eth0 • Modify the file ifcfg-etho as shown below: DEVICE=eth0 TYPE=Ethernet ONBOOT=yes NM_CONTROLLED=yes BRIDGE=br0 $sudo -e /etc/sysconfig/network-scripts/ifcfg-br0 • Modify the file ifcfg-br0 as shown below: DEVICE=br0 NM_CONTROLLED=yes ONBOOT=yes TYPE=Bridge BOOTPROTO=dhcp  You should now see br0 bridge interface with a proper IP address as follows. $ifconfig Step5: Install VirtManager The final step is to install a desktop UI called VirtManager for managing virtual machines (VMs) through libvirt. To install VirtManager:
  • 48. $ sudo yum install virt-manager libvirt qemu-system-x86 openssh-askpass libcanberra-devel 5.5 Setup a minimal CentOS 6 NFS configuration To setup an NFS (Network File System) configuration for two systems, basically we have to consider one system as a server and another one as a client. The following steps show the NFS Server configuration: 5.5.1 SERVER CONFIGURATION: Step1: Checking for yum updates and installing NFS utils • To setup the server: 172.16.30.48/255.255.255.254 • Before setup the server system needs update the packages: "yum update" • Once update is completed reboot the system. "shutdown -r now" • Install nfs-utils rpcbind system configuration package. "yum install nfs-utils rpcbind system-config-firewall-tui" • Modify the selinux file to disable SELINUX "vi /etc/sysconfig/selinux" and set "SELINUX=disabled". "setenforce 0" Step2: Make a folder to be shared In an NFS sharing we have to create folder, that folder is shared with the both the server and client. That folder holds all the data which is transferred between server and client
  • 49. Here we are creating and sharing a folder called image, in the below command we are giving the path in which where that folder is present. $ mkdir /var/lib/libvirt/images Step3: Checking the configuration of nfs, nfslock, and rpcbind: $ chkconfig nfs on $ chkconfig nfslock on $ chkconfig rpcbind on Step4: Configure the firewall setting: $ "system-config-firewall-tui" Step5: Modify the exports file to add the shared storage to make live migration from source to destination system /var/lib/libvirt/images 172.16.30.48/255.255.255.254 (rw, sync, no_root_squash) Step6: Modify the hosts.allow file by following lines: $ sudo /etc/hosts.allow mountd: 172.16.30.46/255.255.255.254 Step7: Modify the hosts.deny file by following lines portmap:ALL lockd:ALL mountd:ALL rquotad:ALL statd:ALL Step8: Restart the following services on Server machine once you completed all the above steps:
  • 50. $ sudo service rpcbind restart $ sudo service nfs restart $ sudo service nfslock restart Once you finish serve configuration, immediately follow the client configuration. To configure the NFS client follow the following steps: 5.5.2 CLIENT CONFIGURATION: Step1: To Setup the Client: 172.16.30.46/255.255.255.254 • Before we setup the client, system need to be updated with other packages: $ sudo yum update • Once update is completed reboot the system. $ sudo shutdown -r now • Install nfs-utils rpc bind system configuration package. $ sudo yum install nfs-utils rpcbind system-config-firewall-tui • Modify the selinux file to disable SELINUX $ sudo gedit /etc/sysconfig/selinux and set SELINUX=disabled $ setenforce 0 Step2: Make a folder to be the mount point. In an NFS sharing we have to create sharable folder, this folder is shared with the both the server and client. This folder holds all the data which is transferred between server and client Here we create and share a folder called image, in the below command we give the path in which where that folder is present. $ sudo mkdir /var/lib/libvirt/images Step3: Start the following services
  • 51. $ sudo chkconfig nfs on $ sudo chkconfig nfslock on $ sudo chkconfig rpcbind on Step4: Restart the following services on Server machine once you completed all the above steps: $ sudo service rpcbind restart $ sudo service nfs restart $ sudo service nfslock restart Once you finish the both server and client NFS configuration, we have to mount the folder which is created during the NFS server and client configuration. To mount a folder we have to use the following steps: Step1: Append the following line to fstab file: $ sudo gedit /etc/fstab <Shared directory> <mount point> <type> <auto> 0 0 172.16.30.48://var/lib/libvirt/images /var/lib/libvirt/images nfs auto 0 0 172.16.30.48: Server name 172.16.30.48:/var/lib/libvirt/images: mount File /var/lib/libvirt/images: Mounting point on client machine (172.16.30.46) nfs : Type Step2: Mount shared nfs file on client machine: $ sudo mount -t nfs 172.16.30.48://var/lib/libvirt/images /var/lib/libvirt/images. 5.6 ITL Algorithm.
  • 52. ITL algorithm is designed to optimize the rejuvenation time predefined by user when workload is variable. Rejuvenation time is decreased when workload increases and rejuvenation time is increased when workload decreases. Working of algorithm is described in below steps Step 1: Begin Step 2: Set variable FTR (Fixed Time Rejuvenation) Step 3: Fetch the system Free Memory and assign to variable FM (Free Memory) Step 4: Set the Threshold Free Memory value to variable FFM (Fixed Free Memory) Step 5: if (FTR==SystemCurrentTime) Then Reboot Else If (FM < FFM) then Reset the FTR= FTR-(1*(FM-FFM)) Step 6: Go to Step 5 Step 7: End CHAPTER 6 TESTING OF SOFTWARE REJUVENATION IN COMPLEX SYSTEM 6.1 Testing There are essentially three main domain and six modules in our project. In this section the results of all six modules are being tested with different OS, VM or VMM. The purpose of this section
  • 53. is to ensure that the resulting system meets the system requirements and there is a seamless transition of data flowing through each of the systems as well as in between one another. These testing provide a sort of "living document". Clients and other developers looking to learn how to use the module can look at these tests to determine how to use the module to fit their needs and gain a basic understanding of the modules. 6.1.1 Testing Strategy The following points are indicative of the testing strategy for unit testing followed in the project. • Review the design specifications and source code for modules to be tested. • Perform a peer review on the module Test Plan. • Create any test "stubs" required to provide input to or receive output from the code module. • When it's time to test particular modules, compile the code in the test environment to check for any missing files required for test plan execution. • Execute the tests. Compare information/values received out of the tested software to those expected, as documented in the Test Plan. • Retest code when an updated version is available. Record results on the module Test Report Form. • When the module is considered to have passed all tests, archive the final Report form(s).
  • 54. Table 6.1: Cold reboot based on Time Table 6.2: Cold reboot based on Workload Test Case ID T-2 Test Case ID T-1 Purpose The system should rejuvenate at given rejuvenation time(TTR) Pre- Conditions System time Inputs Time to Rejuvenate(TTR) Expected Output Reboot Post- Conditions After rebooting the system the current state should not be saved Execution History Date Result Version Remark 17-02-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 15-03-2014 Pass 1.0 Testing passed in CentOS operating system 27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 04-04-2014 Pass 1.0 Testing passed in CentOS operating system 09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 09-04-2014 Pass 1.0 Testing passed in CentOS operating system
  • 55. Purpose System should rejuvenate at given memory threshold value Pre- Conditions System free memory Inputs Memory threshold value Expected Output Reboot Post- Conditions After rebooting the system the current state should not be saved Execution History Date Result Version Remark 17-02-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 15-03-2014 Pass 1.0 Testing passed in CentOS operating system 27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 04-04-2014 Pass 1.0 Testing passed in CentOS operating system 09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 09-04-2014 Pass 1.0 Testing passed in CentOS operating system . Table 6.3: Cold reboot based on both Time and Workload Test Case ID T-3
  • 56. Purpose The system optimize the rejuvenation time based on the workload and then system rejuvenates at an optimized time Pre- Conditions System time and Free memory Inputs Time to Rejuvenate(TTR) and Memory threshold value Expected Output Reboot Post- Conditions After rebooting the system the current state should not be saved Execution History Date Result Version Remark 17-02-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 15-03-2014 Pass 1.0 Testing passed in CentOS operating system 27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 04-04-2014 Pass 1.0 Testing passed in CentOS operating system 09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 09-04-2014 Pass 1.0 Testing passed in CentOS operating system Table 6.4: Warm reboot based on Time Test Case ID T-4 Purpose The system should rejuvenate at given rejuvenation time(TTR) Pre- Conditions System time
  • 57. Inputs Time to Rejuvenate(TTR) Expected Output Reboot Post- Conditions After rebooting the system the current state should be saved Execution History Date Result Version Remark 27-02-2014 Failed 1.0 Testing Failed in CentOS operating system due to OS is not compatible 28-03-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04 27-03-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04 04-04-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04 09-04-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04 09-04-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04 Table 6.5: Cold reboot based on Workload Test Case ID T-5 Purpose The system should rejuvenate at given Memory threshold value Pre- Conditions System Free memory Inputs Memory threshold value Expected Reboot
  • 58. Output Post- Conditions After rebooting the system the current state should be saved Execution History Date Result Version Remark 27-02-2014 Failed 1.0 Testing Failed in CentOS operating system due to OS is not compatible 28-03-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04 27-03-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04 04-04-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04 09-04-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04 09-04-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04 Table 6.6: Warm reboot based on both time and Workload Test Case ID T-6 Purpose The system optimize the rejuvenation time based on the workload and then system rejuvenates at an optimized time Pre- Conditions System Time and Free memory Inputs Time to Rejuvenate and Memory threshold value Expected Output Reboot
  • 59. Post- Conditions After rebooting the system the current state should be saved Execution History Date Result Version Remark 27-02-2014 Failed 1.0 Testing Failed in CentOS operating system due to OS is not compatible 28-03-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04 27-03-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04 04-04-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04 09-04-2014 Pass 1.0.1 Testing passed in Ubuntu 12.04 09-04-2014 Pass 1.0.1 Testing is passed in Ubuntu 12.04 Table 6.7: VM cold reboot based on Time Test Case ID T-7 Purpose The Virtual Machine(VM) should rejuvenate at given rejuvenation time(TTR) Pre- Conditions System time Inputs Time to Rejuvenate(TTR) Expected Output Virtual Machine(VM) Reboot Post- Conditions After rebooting the Virtual Machine(VM) the current state should not be saved
  • 60. Execution History Date Result Version Remark 27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 04-04-2014 Pass 1.0 Testing passed in CentOS operating system 09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 09-04-2014 Pass 1.0 Testing passed in CentOS operating system Table 6.8: VM cold reboot based on Workload Test Case ID T-8 Purpose The Virtual Machine(VM) should rejuvenate at given Memory threshold value Pre- Conditions System Free memory Inputs Memory threshold value Expected Output Virtual Machine(VM) Reboot Post- Conditions After rebooting the Virtual Machine(VM) the current state should not be saved Execution History Date Result Version Remark 27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system
  • 61. 04-04-2014 Pass 1.0 Testing passed in CentOS operating system 09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 09-04-2014 Pass 1.0 Testing passed in CentOS operating system Table 6.9: VM cold reboot based on both Time and workload Test Case ID T-9 Purpose The system optimize the rejuvenation time based on the workload and then Virtual Machine(VM) rejuvenates at an optimized time Pre- Conditions System time and Free memory Inputs Time to Rejuvenate(TTR) and Memory threshold value Expected Output Virtual Machine(VM) Reboot Post- Conditions After rebooting the Virtual Machine(VM) the current state should not be saved Execution History Date Result Version Remark 27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 04-04-2014 Pass 1.0 Testing passed in CentOS operating system 09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 09-04-2014 Pass 1.0 Testing passed in CentOS operating system
  • 62. Table 6.10: VM warm reboot based on Time Test Case ID T-10 Purpose The Virtual Machine(VM) should rejuvenate at given rejuvenation time(TTR) Pre- Conditions System time Inputs Time to Rejuvenate(TTR) Expected Output Virtual Machine(VM) Reboot Post- Conditions After rebooting the Virtual Machine(VM) the current state should be saved Execution History Date Result Version Remark 27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 04-04-2014 Pass 1.0 Testing passed in CentOS operating system 09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 09-04-2014 Pass 1.0 Testing passed in CentOS operating system
  • 63. Table 6.11: VM warm reboot based on Workload Test Case ID T-11 Purpose The Virtual Machine(VM) should rejuvenate at given Memory threshold value Pre- Conditions System Free memory Inputs Memory threshold value Expected Output Virtual Machine(VM) Reboot Post- Conditions After rebooting the Virtual Machine(VM) the current state should be saved Execution History Date Result Version Remark 27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 04-04-2014 Pass 1.0 Testing passed in CentOS operating system 09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 09-04-2014 Pass 1.0 Testing passed in CentOS operating system Table 6.12: VM warm reboot based on both Time and workload
  • 64. Test Case ID T-12 Purpose The system optimize the rejuvenation time based on the workload and then Virtual Machine(VM) rejuvenates at an optimized time Pre- Conditions System time and Free memory Inputs Time to Rejuvenate(TTR) and Memory threshold value Expected Output Virtual Machine(VM) Reboot Post- Conditions After rebooting the Virtual Machine(VM) the current state should be saved Execution History Date Result Version Remark 27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 04-04-2014 Pass 1.0 Testing passed in CentOS operating system 09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 09-04-2014 Pass 1.0 Testing passed in CentOS operating system Table 6.13: VMM reboot based on Time Test Case ID T-13
  • 65. Purpose The Virtual Machine Monitor(VMM) should rejuvenate at given rejuvenation time(TTR) Pre- Conditions System time Inputs Time to Rejuvenate(TTR) Expected Output Virtual Machine Monitor(VMM) Reboot Post- Conditions After rebooting the Virtual Machine Monitor(VMM) connection between VMM and VM’s should loss Execution History Date Result Version Remark 27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 04-04-2014 Pass 1.0 Testing passed in CentOS operating system 09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 09-04-2014 Pass 1.0 Testing passed in CentOS operating system Table 6.14: VMM reboot based on Workload Test Case ID T-14 Purpose The Virtual Machine Monitor(VMM) should rejuvenate at given Memory threshold value Pre- Conditions System Free memory
  • 66. Inputs Memory threshold value Expected Output Virtual Machine Monitor(VMM) Reboot Post- Conditions After rebooting the Virtual Machine Monitor(VMM) connection between VMM and VM’s should loss Execution History Date Result Version Remark 27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 04-04-2014 Pass 1.0 Testing passed in CentOS operating system 09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 09-04-2014 Pass 1.0 Testing passed in CentOS operating system Table 6.15: VMM reboot based on both Time and Workload Test Case ID T-15 Purpose The system optimize the rejuvenation time based on the workload and then Virtual Machine Monitor(VMM) rejuvenates at an optimized time Pre- Conditions System time and Free memory Inputs Time to Rejuvenate(TTR) and Memory threshold value Expected Virtual Machine Monitor(VMM) Reboot
  • 67. Output Post- Conditions After rebooting the Virtual Machine Monitor(VMM) connection between VMM and VM’s should loss Execution History Date Result Version Remark 27-03-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 04-04-2014 Pass 1.0 Testing passed in CentOS operating system 09-04-2014 Pass 1.0 Testing passed in Ubuntu 12.04 operating system 09-04-2014 Pass 1.0 Testing passed in CentOS operating system Table 6.16: VM migration based on Time Test Case ID T-16 Purpose The Virtual Machine(VM) should migrate from one Physical Machine(PM1) to another Physical Machine(PM2) at given rejuvenation time(TTR) Pre- Conditions System time Inputs Time to Rejuvenate(TTR) Expected Output Virtual Machine(VM) should migrate from one Physical Machine(PM1) to another Physical Machine(PM2)
  • 68. Post- Conditions After the migration Virtual Machine(VM) from Physical Machine(PM1) should reboot Execution History Date Result Version Remark 27-02-2014 Failed 1.0 Testing Failed in Ubuntu 12.04 operating system due to OS is not compatible 05-03-2014 Failed 1.0 Testing Failed in Ubuntu 12.04 operating system due to OS is not compatible 10-03-2014 Failed 1.0 Testing Failed in Ubuntu 12.04 operating system due to OS is not compatible 30-03-2014 Pass 1.0.1 Testing passed in CentOS operating system 04-04-2014 Pass 1.0.1 Testing passed in CentOS operating system 09-04-2014 Pass 1.0.1 Testing passed in CentOS operating system Table 6.17: VM migration based on workload Test Case ID T-17 Purpose The Virtual Machine(VM) should migrate from one Physical Machine(PM1) to another Physical Machine(PM2) at given Memory threshold value Pre- Conditions System Free memory Inputs Memory threshold value Expected Output Virtual Machine(VM) should migrate from one Physical Machine(PM1) to another Physical Machine(PM2) Post- Conditions After the migration Virtual Machine(VM) from Physical Machine(PM1) should reboot
  • 69. Execution History Date Result Version Remark 27-02-2014 Failed 1.0 Testing Failed in Ubuntu 12.04 operating system due to OS is not compatible 05-03-2014 Failed 1.0 Testing Failed in Ubuntu 12.04 operating system due to OS is not compatible 10-03-2014 Failed 1.0 Testing Failed in Ubuntu 12.04 operating system due to OS is not compatible 30-03-2014 Pass 1.0.1 Testing passed in CentOS operating system 04-04-2014 Pass 1.0.1 Testing passed in CentOS operating system 09-04-2014 Pass 1.0.1 Testing passed in CentOS operating system Table 6.18: VM migration based on both Time and workload Test Case ID T-18 Purpose The system optimize the rejuvenation time based on the workload and then Virtual Machine(VM) should migrate from one Physical Machine(PM1) to another Physical Machine(PM2) based on optimized time Pre- Conditions System Free memory and System Free memory Inputs System Time and Memory threshold value Expected Output Virtual Machine(VM) should migrate from one Physical Machine(PM1) to another Physical Machine(PM2) Post- Conditions After the migration Virtual Machine(VM) from Physical Machine(PM1) should reboot Execution History Date Result Version Remark 27-02-2014 Failed 1.0 Testing Failed in Ubuntu 12.04 operating system due to OS is not compatible
  • 70. 05-03-2014 Failed 1.0 Testing Failed in Ubuntu 12.04 operating system due to OS is not compatible 10-03-2014 Failed 1.0 Testing Failed in Ubuntu 12.04 operating system due to OS is not compatible 30-03-2014 Pass 1.0.1 Testing passed in CentOS operating system 04-04-2014 Pass 1.0.1 Testing passed in CentOS operating system 09-04-2014 Pass 1.0.1 Testing passed in CentOS operating system CHAPTER 7 RESULTS OF SOFTWARE REJUVENATION IN COMPLEX SYSTEM 7.1 Results ITL algorithm implemented on all modules is analyzed using SPNP, which help to get the value of MTTR and MTTF, from these values, we calculate the availability and downtime factor for particular algorithm implied to the system. Availability value is found out for all modules and based on these values we can analyze how much time the system will be available for usage without any failure. In SPNP we need to develop a petri net diagram for particular algorithm and for this diagram we are supposed to code in CSPL (C language based on stochastic petri net) to define the transition of tokens from one place to another through timed transitions or immediate transitions. Token are deposited in place and are transmitted from one place to another by timed or immediate transitions. To check that petri net diagram is having proper flow, SPNP provide the animation option where we are supposed to code for guiding token transitions, when and where to move i.e. from one place to another place. when the code is executed the animated petri net diagram will show how the transition are taking place , if any error occur during this animated transition then it is clear that the algorithm or petri net diagram for that algorithm is error prone. Table 7.1: symbol conventions.
  • 71. Figure 7.1: Memory model Figure7.2: Clock model Table 7.2: Clock and Memory SRN model description Places & Transitions Description Pclock Place where clock is initialized or reset. Ptpolicy This place indicates the rejuvenation time is reached when token is present in it. Symbol Conventions Place Timed transition Immediate transition Arc Tpolicy Ttrigger Pclock Ptrigger Ptpolicy Tclock Ptpolicy Tmem Tmemv Pmem
  • 72. Ptrigger This place is point for rejuvenation. Pmem Place where RAM utilization is compared with threshold value predefined. Pmemv Place which indicates RAM utilization reached its threshold point Tclock Timed transition, it is enabled when the given time is reached Tpolicy Timed transition, it is enabled when the token is present in Ptpolicy Ttrigger Immediate transition, it is enabled when the token is present in Ptrigger and if the given time is reached. Tmem Timed transition, it is enabled when the given time is reached Tmemv Immediate transition, it is enabled when the token is present in Ptpolicy and if the given time is reached. Figure 7.3: OS Cold SRN model. Table 7.3: OS Cold SRN model description Trej tarej Working Twork Rej Taginig aging
  • 73. Figure 7.4: OS Warm SRN model Table 7.4: OS Warm SRN model description Places & Description Places & Transitions Description Working Place which indicates system is in normal working state. Aging Place which indicates system is suffering from aging problem. Rej Place which indicates system is under rejuvenation process. Trej Immediate transition, it is enabled when the given time is reached Twork Timed transition, it is enabled when the token is present in Rej Taging Timed transition, it is enabled when the token is present in working and Pmemv. tarej Immediate transition, it is enabled when the token is present in aging. Tsave Taging Working Toptiwork optimize Toptiaging saveTrejRej Tresume Treworking Resume
  • 74. Transitions Working Place which indicates system is in normal working state. Aging Place which indicates system is suffering from aging problem. Rej Place which indicates system is under rejuvenation process. Optimize Place where time is optimized based on workload. Save Place where image of kernel is created and saved. Resume Place where kernel image saved is retrieved. Taging Timed transition, it is enabled when the token is present in working and Pmemv. Topti Immediate transition, it is enabled when the token is present in aging Toptiwork Timed transition, it is enabled when the token is present in optimize. Tsave Timed transition, it is enabled when the token is present in working and Ptpoicy. Trej Immediate transition, it is enabled when the token is present in save. Tresume Timed transition, it is enabled when the token is present in rej. Trewoking Immediate transition, it is enabled when the token is present in resume. Figure 7.5: VM cold SRN model Table 7.5: VM Cold SRN model description T_opt Optimize T_aging AgingRejuvenation T_rej
  • 75. Places & Transitions Description Working Place which indicates system is in normal working state. Aging Place which indicates system is suffering from aging problem. Rejuvenation Place which indicates system is under rejuvenation process. Optimize Place where time is optimized based on workload. Taging Timed transition, it is enabled when the token is present in working and Pmemv. Trej Timed transition, it is enabled when the token is present in Ptpolicy and if the given time is reached. T_aging Timed transition, it is enabled when the token is present in aging. T_opt Timed transition, it is enabled when the token is present in Optimize. T_rej Timed transition, it is enabled when the token is present in Rejuvenation. Tmem Aging Taging Optimize Topt Working Tsave Tsave SaveTrej Rejuvenation Resume Tres Tmem Working Aging Taging Optimize Topt Working Tsave Tsave SaveTrej Rejuvenation Resume Tres Tmem Aging Taging Optimize Topt Working Tsave Tsave SaveTrej Rejuvenation Resume Tres Memory
  • 76. Figure 7.6 VM warm SRN model Table 7.6: VM Warm SRN model description Places & Transitions Description Working Place which indicates system is in normal working state. Aging Place which indicates system is suffering from aging problem. Rejuvenation Place which indicates system is under rejuvenation process. Optimize Place where time is optimized based on workload. Save Place where image of VM is created and saved. Resume Place where VM image saved is retrieved. Memory Place where indicates the vary in memory Taging Timed transition, it is enabled when the token is present in aging. Tmem Timed transition, it is enabled when the token is present in memory. Tsave Timed transition, it is enabled when the token is present in Ptpolicy and if the given time is reached. T_save Timed transition, it is enabled when the token is present in memory and pmemv==2. Trej Timed transition, it is enabled when the token is present in save. Tres Timed transition, it is enabled when the token is present in Rejuvenation. Twmem Timed transition, it is enabled when the token is present in Pmemv and if the given time is reached. Trevert Timed transition, it is enabled when the token is present in Resume. Topt Timed transition, it is enabled when the token is present in optimize. Tmem Aging Taging Optimize Topt Working Tsave Tsave SaveTrej Rejuvenation Resume Tres T afail Tnormal agingfailed T afail aging T aging migrate T migrate Working 1 T rej1 T rejre2 Rej T rejre1 T rej2 T revert Working 2 T hyper
  • 77. Figure 7.7 VM migration SRN model Table 7.7: VM Migration SRN model description Places & Transitions Description Working1 Place which indicates system is in normal working state. Working2 Place which indicates system is in normal working state. Aging Place which indicates system is suffering from aging problem. Rej Place which indicates system is under rejuvenation process. Optimize Place where time is optimized based on workload. migrate Place which indicates VM is getting migrated Tmaging Timed transition, it is enabled when the token is present in aging. Tmigrate Timed transition, it is enabled when the token is present in working1. Trej1 Timed transition, it is enabled when the token is present in working1==1 and if the given time is reached. Trej2 Timed transition, it is enabled when the token is present in working1 and pclock and if the given time is reached. Trejre1 Timed transition, it is enabled when the token is present in working1 and rej.
  • 78. Trejre2 Timed transition, it is enabled when the token is present in rej and working2==0. Trevert Timed transition, it is enabled when the token is present in Working2==2. Tnormal Timed transition, it is enabled when the token is present in optimize. Taging Timed transition, it is enabled when the token is present in Pmemv and if the given time is reached. Tafail Timed transition, it is enabled when the token is present in aging. Thyper Timed transition, it is enabled when the token is present in migrate. 7.2 Discussion On developing above petri net diagram in SPNP and coding in CSPL for transition of token, help us analyze the availability value for each module. On giving the transition time for transition to happen and transition time took in real-time implementation to move from one state to another state, based on values in Table 7.9 MTTR and MTTF value can be calculated. From these values availability of the module can be calculated from the formula below Availability = MTTR ÷ (MTTR + MTTF). Availability for all the modules are analyzed in this project and their respective availability values are calculated on an average for 30 days Table 7.8. From the availability value of 30 days we can calculate availability of the system for any number of years. For all token to move from one place to another, need to pass through the transition by accepting all guard function conditions. Table 7.9 has three parameters namely transition, and value is time for particular transition to take place and mean value gives the value in terms of 1/hour. Mean value is used as standard format of time in analysis using SPNP
  • 79. Table 7.8: Availability values of rejuvenation methods We have considered many key parameters like aging rate, rejuvenation rate, aging rate, failure rate, suspend rate, resume rate, restart rate etc. and assumed safety thresholds for each of modules as given in Table 7.9. Based on these values we detect the availability of the system using Time and variable workload policy. In all modules we just set the rejuvenation time for their safe levels at a certain interval of time and set the threshold memory value and if aging is detected then the rejuvenation time is optimized by ITL algorithm and on that optimized time rejuvenation occurs. Table 7.9: Cold OS rejuvenation transition rates Rejuvenation Methods Days Steady State Availability Downtime Cold OS Rejuvenation Warm OS Rejuvenation Cold VM Rejuvenation Cold VMM Rejuvenation Warm VM Rejuvenation VM Migration 30 30 30 30 30 30 0.998824 0.998983 0.998633 0.998846 0.998799 0.999219 0.001176 0.001017 0.001367 0.001154 0.001201 0.000781 Transition Value Mean time OS aging rate OS Rejuvenation rate OS Failure rate OS Suspend rate OS Restart rate VM Resume rate VM rejuvenation rate VM aging rate VM failure rate VM failure recovery rate 1 week 1 month 1 week 1 month 30 sec 15sec 1 month 1 week 1 week 1 min 0.005952381 0.001388889 0.005952381 0.001388889 120 240 0.001388889 0.005952381 0.005952381 60
  • 80. Graphs are plotted based on transition rate and availability value. Graphs clearly depicts the availability value at particular time, all graphs are plotted for thirty days interval. All the graphs below have X-axis as availability value and Y- axis as time (1/hour). In general in all modules, if rate of rejuvenation is high then the system will be rebooted repeatedly in short intervals which lead to high downtime and hence availability value is low initially in all graphs plotted. Graph 1: OS Cold availability In cold OS reboot process availability factor is low as system takes more time to reboot, hence we have high downtime. In this module the system is restarted normally at rejuvenation time, for this process downtime depends on the processor speed of the system, normally it might take average of one to three minutes to get back to normal working state. Availability value of this module is 0.998824 thirty days
  • 81. Graph 2: OS Warm reboot In OS warm reboot module availability value when compared to cold reboot module is high, because here complete kernel is saved as image and stored on hard disk, after reboot grub loader extract this image and kernel image will be loaded. Hence we provide no loss of data and no interruption of applications running even after reboot. Availability value of this module is 0.998983 for 30 days Graph 3: OS Warm reboot and Cold reboot comparison
  • 82. Comparison graph give us the variation of availability value in cold and warm reboot of OS, initially both modules have same availability value due to high rejuvenation rate which have less availability value, the graph clearly depicts warm reboot of OS has high availability compared to cold reboot of OS. Graph 4: VM Cold reboot In VM cold reboot, again the availability value decreases as rebooting the system takes much time and therefore it provide the low availability to the user using the system and chances of losing the data and request failure is high. This module has 0.998633 availability value for thirty days.
  • 83. Graph 5: VMM reboot Again this module has same fault as it was in other cold reboot processes and hence it has availability value of 0.998846 for thirty days. Graph 6: comparison graph for VMM and VM reboot Comparing cold reboot module of VM and VMM. We have better availability value for VMM Cold reboot module. Both module give almost same availability to the system. Graph 7: comparison graph for OS and VM cold reboot
  • 84. From the graph we clearly come to know that VM cold reboot module has better availability value when compared to OS cold reboot module. Graph 8: Graph for VM warm reboot As similar to OS warm reboot, VM warm reboot as high availability compared to cold reboot modules, availability value of this module is 0.998799 Graph 9: Graph for VM migration
  • 85. VM migration module has availability value of 0.999219 which is highest of all modules done in this project, in this module as no reboot and no images are saved but the complete virtual machine is migrated to another server conFigureured, hence it has no data loss and no request failure or error in running applications. Graph 10: Graph for VM comparison This graph gives the comparison result of all modules in VM. VM migration module has high availability and VM cold has lowest availability.
  • 86. GRAPH 11: Comparison graph of all modules This graph is main graph of our analysis, which has comparison of availability value of all modules with respect to rejuvenation time, comparing availability value of all modules clearly tell us that VM migration module has high availability and OS cold reboot module has very low availability. 7.3 Snapshots: 7.3.1 Snap shots of OS Cold Reboot
  • 87. Snap shot: 1 Snap shot: 2 Snap shot: 3 Snap shot: 4 7.3.2 Snap shots of OS Warm Reboot
  • 88. Snap shot: 5 Snap shot: 6 Snap shot: 7 Snap shot: 8
  • 89. 7.3.3 Snap shots of VM Cold Reboot Snap shot: 9 Snap shot: 10 Snap shot: 11 Snap shot: 12
  • 90. Snap shot: 13 7.3.4 Snap shots of VM Warm Reboot Snap shot: 14 Snap shot: 15 Snap shot: 16 Snap shot: 17
  • 91. Snap shot: 18 7.3.5 Snap shots of VM Migration Snap shot: 19 Snap shot: 20 Snap shot: 21 Snap shot: 22
  • 92. Snap shot: 23 Snap shot: 24 7.3.6 Snap shots of VMM Reboot Snap shot: 25 Snap shot: 26 Snap shot: 27 Snap shot: 28
  • 93. Snap shot: 29 CHAPTER 8 CONCLUSION 8.1 Conclusion Intelligent Time and Load (ITL) balancing policy accepts time from user and optimize the rejuvenation time whenever workload is variable, otherwise the system is rejuvenated at its rejuvenation point. ITL policy avoids software failure and it helps to achieve high availability of complex system. ITL policy is used in experimenting on six module namely OS Cold Reboot, OS Warm Reboot, VM Cold Reboot, VM Warm Reboot, VM Migration and VMM Reboot. Over the course of experiment VM Migration achieves the best study state availability as long as VM live migration is fast enough and other server have capacity to receive the migrated VM. In the existing policy rejuvenation is proposed based on various parameters such as hardware failures, memory leaks, CPU utilization, request failures and so on. ITL policy considers Physical memory as a primary factor for rejuvenation, hence it is better way to avoid performance degradation and to increase the availability of system. 8.2 Limitations Some important limitations are as follows: • Project is restricted to linux platform. • OS Warm reboot is compatibilty on Ubuntu but not with other operating systems. • VM migration is compatible on CentOS but not with other operating systems. • Complete(100%) availabilty is not provided in all modules.
  • 94. 8.3 Future Enhancement The basic idea is to accomplish request processing on the same node in which the rejuvenation is taking place. The combination of reboot and failover, enables a system to continue processing requests during the reboot. Before rebooting an OS or an application running on one node of the clustered environment, requests to the node are redirected to the other nodes of the system. This technique improves availability of systems. Figure 8.1 Sharing of request Figure 8.1 describe the simultaneous execution of request processing and rejuvenation on the same node requires an alternative request processing environment. The alternative environment takes over processing of all requests from the original environment, and then the rejuvenation of the original environment is started. Hence by this 100% availability can be achieved.