2. GOOD ESTIMATES
• Process Prediction
• It guides our decision making,
– before the development begins,
– Through the development process
– During the transition of the product to the
customer.
– And while the software is being maintained.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
4. • A prediction is useful only if it is reasonably accurate.
• Predictions are not expected to be the exact but to
be close enough to the eventual actual numbers that
we can make sound judgment.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
5. What is an estimate ?
• As prediction is the range of window, rather
than a single number.
• An estimate is not a target, it is the
probabilistic assessment, so that the value
produced as an estimate is really the center of
a range.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
7. • Form the graph we can compute the
probability that any project based on the
given requirements will be completed within a
time interval [ t1,t2]
• The probability is simply the area under the
curve between t1 and t2
Mr. M. E. Patil
S.S.B.T COET, Bambhori
8. • Formally the estimate is defined as the
median of the (unknown) distribution.
• To indicate the window , estimate should be
presented as a triple
– The most likely value(i.e. median of distribution)
– Plus the lower and upper bounds of the value
Mr. M. E. Patil
S.S.B.T COET, Bambhori
9. Evaluating the estimate accuracy
• Suppose E is estimated value and A is the
actual value.
• The relative error in estimate is
• RE = (A-E)/A
• If estimate is greater than the actual value ,
the relative error is negative
• If estimate is less than the actual value , the
relative error is positive
Mr. M. E. Patil
S.S.B.T COET, Bambhori
11. Mean magnitude of relative error is
If this is small our estimate is good
Mr. M. E. Patil
S.S.B.T COET, Bambhori
12. Cost Estimation problems and
approaches
• Novelty
• Politics
• Technology change
• Price to win.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
13. Current approaches to cost estimation
• Expert Opinion.
• Analogy
• Decomposition
• Models
Mr. M. E. Patil
S.S.B.T COET, Bambhori
14. Bottom-up and top-down estimation
• Bottom-up estimation begins with the lowest
level parts of the products or tasks, and
provides estimates for each.
• Top-down estimation begins with the overall
process or product. A full estimate is made
and then the estimates for the components
are calculated
Mr. M. E. Patil
S.S.B.T COET, Bambhori
15. Models for Effort and Cost.
• By deploying the models in the process,
estimators can examine the relationship
between the model and its accuracy, so that
they can be fine tune it to improve accuracy
for future projects.
•
Mr. M. E. Patil
S.S.B.T COET, Bambhori
16. Two types of models for estimating the
efforts.
• Cost Models:- Provides the direct estimates of
efforts or duration. Most cost models are based
on empirical data reflecting factors that
contribute to overall cost.
• These models have one primary input (usually
the measure of product size). And a number of
secondary adjustment factors (cost drivers).
• Cost Drivers:- Are characteristics of the project ,
process, products or resources that expected to
influence effort and duration in some way.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
17. • Constraint Models: demonstrates the
relationship overtime between two or more
parameters of effort, duration, or staffing
level.
• The Rayleigh Curve is used as a constraint
model in several commercial products.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
19. Regression-based Models
• By collecting the data form the previous
projects and examining relationship among
the attribute measures captured , software
engineers hypothesized that some factors
could be related by and equation.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
21. • The regression has been performed using the
logarithm of project effort(measured in
person months) on the Y axis and the
logarithm of project size(measured in
thousands of lines of code) on X axis
• Transforming the linear equation
– Log E = log a + b log S
Mr. M. E. Patil
S.S.B.T COET, Bambhori
22. From log-log domain to real domain yields
and exponential relation
Mr. M. E. Patil
S.S.B.T COET, Bambhori
23. • If size were a perfect predictor of effort then
every point of the graph would lie on the line
of the equation, with residual error of 0.
• In reality there is usually significant residual
error.
• Identify the factors that cause the variation
between predicted and the actual efforts.
• A factor analysis can help to identify these
additional parameters.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
24. • Where F is the effort adjustment factor,
computed as the product of the cost driver
values.
• This computation of F is valid only when the
individual adjustment factors are
independent.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
25. COCOMO
• Constructive Cost Model
• Original COCOMO :- Effort
• It is collection of three models
• 1. Basic Model:- It can be applied when little
about the project is known
• 2. Intermediate model:- is used after
requirements are specified.
• 3. Advanced model is used when design is
complete
Mr. M. E. Patil
S.S.B.T COET, Bambhori
26. • All take the same form as above.
• E- Efforts in person months
• S size measured in thousands of delivered
source instructions (KDSI) and
• F is the adjustment factor equal to 1 for basic
model
Mr. M. E. Patil
S.S.B.T COET, Bambhori
27. • The values of a, b are listed in table depends
on the development mode, determined by the
type of software under construction.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
28. • Organic System:- It involves data processing.
• Embedded system:- Contains real-time
software that is an integral part of a larger,
hardware-based systems.
• Semi-detached, is somewhere in between
organic and embedded.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
32. COCOMO 2.0
• In stage 1 when the project is using prototypes
for high-risk issues including user interfaces,
software and system interaction COCOMO 2.0
estimates size in object points.
• At stage 2, a decision the designer considers
whether to use different architectures and
notions of operation. There is not sufficient
information to support fine-grained effort and
duration estimates. For stage 2 COCOMO 2 uses
function point as a size measure.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
33. • By stage 3, development as started and far
more information is available. This stage
matches the original COCOMO model, in that
sizing can be done with regards lines of code
and many factors can be estimate with a
certain degree of comfort. COCOMO 2 uses
models of reuse, incorporates maintenance
and breakage and more.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
35. Putnam’s SLIM model:-
• In 1978 the US army required a method to
estimate total effort and delivery time for
massive projects
• Putnam produced the SLIM model, to be used
on projects containing in excess of 70,000
lines of code.
• The SLIM equation can be altered for smaller
projects.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
36. • Putnam’s model assumes that effort for
software development is distributed similarly
to a group of Rayleigh curves, one for each
major development activity .
• Putnam used some empirical observations
about productivity levels to produce his
software equation from the basic Rayleigh
curve formula.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
38. • This equation relates size (lines of code) to
various variables:
– a technology factor, C,
– total project effort measured in person years, K,
– and elapse time to delivery, td , measured in
years.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
39. • The relationship is expressed as above.
• In theory, it is the point where the Rayleigh
curve reaches a maximum. In practice this
technology factor can take on up to 20 values,
and the equation can not be used unless size
can be estimated, the technology factors
agreed on, and either K or held constant.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
40. • As SLIM software equation includes the fourth
power, it has strong implication for resource
allocation on large projects.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
41. • D0 constant called as Manpower acceleration
takes value depending upon type of project.
• Ex : 12.30 for system interacting with others
• 15 for stand alone systems
27 for re-implementation of existing systems
Using these two equations we can find effort or
duration.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
42. • We can derive the above equation.
• SLIM uses different Rayleigh curves for
– Design and code
– Test and validation
– Maintenance
– Management
Mr. M. E. Patil
S.S.B.T COET, Bambhori
43. Problems with existing modeling
methods
• Model Structure
– Most researchers and practitioners agree that
product size is the key to establishing the effort
needed to create the product.
– But the exact association between size and effort
is not known.
– Most of the models suggests that effort is
approximately proportional to size, but they
include an adjustment for diseconomy of scale.
– So that larger projects are less productive.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
44. • Overlay Complex Models
– organization’s specific features can affect its
productivity
– many models incorporate adjustment factors, like
COCOMO’s cost drivers and SLIM’s technology
factor, to provide the flexibility needed to capture
these differences.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
45. This generalized approach has fallen
short of its promises
• Using the COCOMO cost drivers does not
always improve the estimation accuracy.
• It is not simple to obtain an accurate
estimate of the technology factor, and
SLIM estimates are very sensitive to the
technology factor.
• Cost drivers are not independent but are
treated as such.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
46. • Cost driver values are normally based on
subjective assessments of the influence of
some factor on overall project effort.
• Current models include various adjustment
factors, enabling the estimator to cope with
many projects. However, the project
undertaken by a single group are normally
very similar and so only a few factors need to
be considered.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
47. Product size estimation:-
• Most models need an estimation of the size of
the product.
• This variable is not measurable early in the life
cycle
• Models like COCOMO and SLIM need size in
lines of code (LOC), but LOC can not be
producing from the requirement or invitation
to bid for the project.
• Estimates of LOC is often difficult to perform.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
48. Dealing with the Problems of Current
Estimation Methods:-
• Local data definition:-
– The first and most critical approach of improving
cost estimation in a particular environment is to
use size and effort measures that are defined
consistently across the environment.
– The measurement must be understood across all
who must supply or use them, and two people
measuring the same product should produce
basically the same number or rating.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
49. Calibration:-
• Calibration significantly improves the accuracy
of all models.
• The calibration process includes two processes
Ensuring that the values supplied to the
model of consistent with the model needs
and expectations.
Readjusting the model coefficients using
data from the past project matches the basic
productivity found in the new environment.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
50. Independent Estimation Group:-
• DeMarco states that it is useful to assign
estimating responsibility to a particular group
of people.
• This group offers estimates to all projects,
storing the outcomes of all data capture and
analysis in a historical database
Mr. M. E. Patil
S.S.B.T COET, Bambhori
51. Reduce input subjectivity:-
• Subjectivity of ratings and early estimates of
code size can add to the inaccuracy of
estimates.
• There is interest in different size
measurements that match better the likely
size of the final product.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
52. Preliminary estimates and re-
estimates:-
• Early estimates involve using incomplete
information.
• Estimation is likely to improve by performing two
steps: basic preliminary estimates on measures of
available products, and re-estimating as more
information becomes available.
• It is likely that the first estimates are founded on
expert opinion and analogue. There are various
approaches available to improve the outcomes of
using expert opinion and analogy.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
53. Group estimation:-
• One simple approach is to gain the opinion of
a group of experts instead of an individual
• This allows the views of individuals with
diverse project experience to be incorporated
in the estimation.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
56. • Putman and Fitsimmons suggest that group
estimation should not be just a simple average
of individual estimation
Mr. M. E. Patil
S.S.B.T COET, Bambhori
58. • Estimation by Analogy
• Re- Estimation
• Alternative size measures for cost measures
Mr. M. E. Patil
S.S.B.T COET, Bambhori
59. Locally developed cost measures
• Decomposition cost elements
• Formulate cost theory
• Collect Data
• Analyze data and evaluate model
• Check models
Mr. M. E. Patil
S.S.B.T COET, Bambhori
60. Basics of Reliability Theory
• The basic difficulty of reliability has its roots in
the more general theory of systems and
hardware reliability
• The basic difficulty of reliability theory is to
anticipate when a system will finally fail
• With hardware reliability it is typically component
failures due to physical wear that we are
interested in
• Such failures are probabilistic in nature, in that
we do not know when a component will fail but
know it will fail
Mr. M. E. Patil
S.S.B.T COET, Bambhori
61. • so we can attach a probabilistic value of the
likelihood the product will fail at a certain time
• We can use the same idea with software and
produce a basic model of component
reliability and create a probability density
function (pdf) f of t (written f(t)) that
describes our uncertainty about when the
component will fail.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
62. • Suppose it is known that a component has a
maximum life of 10 hours
• It will fail within 10 hours of use.
• Suppose it is just as likely to fail in the first 2
minutes of use as the last 2 minutes of the 10
hours.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
63. The pdf f(t) for this behavior is shown
in the figure
Mr. M. E. Patil
S.S.B.T COET, Bambhori
64. • The function f(t) is outlined to be 1/10 for any
t between 0 and 10, 0 for and t>10.
• In general, for any x we can define the uniform
pdf over interval [0,x] to be 1/x for any
interval [0,x] and 0 elsewhere.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
65. Uniform pdf
• The uniform distribution has various
limitations for reliability modeling.
• It only applies when the failure time is
bounded.
• In many situations, no such bound exists, and
we need a pdf that reflects there may be an
arbitrary long time to failure.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
66. The figure below illustrates an unbounded pdf that reflects the
concept that the failure time happen purely randomly. The
function is expressed as an exponential function
Mr. M. E. Patil
S.S.B.T COET, Bambhori
67. • we want to know how long a component will
behave correctly before it fails
• we want to know the probability of failure
from time 0 to a given time t
• The distribution function F(t) is the probability
of failure between 0 and t, stated as:
Mr. M. E. Patil
S.S.B.T COET, Bambhori
68. • We say that a component survives until it fails
the first time, so that we can think of as the
opposite concept to failure.
• Thus, we define the reliability function R(t) as:
• R(t)=1-F(t)
• This function produces the probability that the
component will function properly up to time t.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
71. • The mean time to failure (MTTF) is the mean
of the probability density function. We can
calculate the mean of the pdf f(t) as above
• The median time to failure is the point in time
t at which the probability of failure after t is
the same as the probability of failure before t.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
73. Reliability problem for scenario of attempting to
fix failures after each occurrence
• For each i, there is a new random variable that
depicts the time of the ith failure.
• Each has its own probability density function .
• Here we would anticipate that the probability
density function of to be different from the
probability density function .
• We would expect to be greater than as newer
components are less likely to fail than older ones.
• In such a situation we have reliability growth:
successive observed failure times tend to grow
Mr. M. E. Patil
S.S.B.T COET, Bambhori
74. • A system runs successfully for a time and then it
fails.
• Once the failure occurs there is a need to repair
the fault.
• It is therefore useful to know the mean time to
repair (MTTR) for the component that has failed
• Combining this time with the mean time to
failure (MTTF) tells us how long the system is
unavailable for the users mean time between
failures (MTBF)
• MTBF = MTTF + MTTR
Mr. M. E. Patil
S.S.B.T COET, Bambhori
75. • Availability is the probability that a
component is operating at a given point in
time. Pressman defines available as
Mr. M. E. Patil
S.S.B.T COET, Bambhori
76. The software reliability problem
• There are many reasons for software to fail, but none
involves wear and tear.
• Usually software fails due to design problem
• Other failures occurs when the code is written or
changes are introduced to a working system.
• These changes are from new design, changed
requirement, revised design, corrections in existing
problem.
• These does not create the failure immediately.
• The failures are triggered only by certain states and
inputs.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
77. • When changes are implemented without
introducing new fault, so that by fixing the
problem we increase the overall reliability of the
system.
• When hardware fails, the problem is fixed by
replacing the failed components with new ones
or repaired one.
• And the system is restored to its previous
reliability.
• Rather than growing the probability it is just
maintained.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
78. • The difference between the hardware and
software reliability is the difference between
the intellectual failure(due to design faults)
and physical failure.
• Assumptions about software reliability
– The software is operating in a real or simulated
environment.
– When software failures occurs, attempts are made
to find and fix the faults that caused them.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
79. • We are not computing an exact time for the
next failure
• we are using the past history to help us make
prediction of the failure time.
• All attempts to measure reliability , how ever
expressed are examples of prediction.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
80. Parametric Reliability Growth Models
• we are modeling the reliability of a program that
operates in the real world or simulates user
environment, and faults are fixed after failure
occurs
• We make two further assumptions about our
program
– Executing the program involves selecting inputs from
some space I (the totality of all possible inputs).
– The program transforms the input into
outputs(Comprising a space O)
Mr. M. E. Patil
S.S.B.T COET, Bambhori
81. The Jelinski - Moranda model
The J-K model is the best known model of reliability models.
It assumes that , for each i
Mr. M. E. Patil
S.S.B.T COET, Bambhori
83. • In the figure N is the initial number if faults, and Φ is
the contribution of each fault to the overall failure rate.
• Thus underlying model is the exponential model, so
that the type 1 uncertainty is random and exponential.
• There is no type 2 uncertainty in this model, it assumes
that fault detection and correction begins when a
program contains N faults and that fixes are perfect( in
that they correct the fault causing the failure, and they
introduce new faults).
• The model also assumes that all faults have the same
rate.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
84. • In figure the hazard rate for the exponential
distribution is λ, it follows the graph of JM
hazard rate looks like the step function.
• Between (i-1) and I failure, the hazard rate is
• (N- i + 1) Φ
• The interface procedure for JM is called
maximum likelihood estimation.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
85. Criticisms of this model
• 1. The sequence of rates is considered by the
model to be purely deterministic.
• This assumption is not realistic.
• 2 The model assumes all faults contribute equally
to the hazard rate
• Faults vary dramatically in their contribution to
program unreliability.
• 3. We will show that the reliability predictions
obtained from the model are poor
• They are usually too optimistic.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
86. Littlewood model:-
• The Littlewood model attempts to be a more
realistic fault model than Jelinski-Morando by
seeing hazard rates as independent random
variables.
• Whereas Jelinki-Morando is represented with
equal steps, Littlewood has steps with diverse
size.
• Both the Jelinski-Morando and Littlewood models
are general classes called exponential order
statistical models.
Mr. M. E. Patil
S.S.B.T COET, Bambhori
87. • In this kind of model, the faults can be seen as
competing risks: at any point in time, any of the
remaining faults can cause a failure, but the
chance that it will be a specific fault is identified
by the hazard rate for that fault
• It can be shown that the times, , at which the
faults show themselves are independent,
identically distributed random variables.
• For the J-K model, this distribution is exponential
with parameter .
Mr. M. E. Patil
S.S.B.T COET, Bambhori
88. For the Littlewood model the
distribution has a Pareto distribution:
Mr. M. E. Patil
S.S.B.T COET, Bambhori