Why bother with measurement and metrics? If you never use the data you collect, this is a valid question—and the answer is “Don’t bother, it’s a waste of time.” In that case, you’ll manage with opinions, personalities, and guesses—or even worse, misconceptions and misunderstandings. Based on his more than forty years of software and systems development experience, Ed Weller describes reasons for measurement, key measures in both traditional and agile environments, decisions enabled by measurement, and lessons learned from successful—and not so successful—measurement programs. Find out how to develop and maintain consistent data and valid measures so you can estimate reliably, deliver products with known quality, and have happy users and customers—the ultimate trailing indicator. Learn to manage projects dynamically with the support of current metrics and data from past projects to guide your management planning and control. Join Ed to explore how to invest in measurements that provide leading indicators to help you meet your company and customer goals.
Software Metrics: Taking the Guesswork Out of Software Projects
1.
TJ
Half‐day Tutorial
6/4/2013 8:30 AM
"Software Metrics:
Taking the Guesswork Out of
Software Projects"
Presented by:
Ed Weller
Integrated Productivity Solutions, LLC
Brought to you by:
340 Corporate Way, Suite 300, Orange Park, FL 32073
888‐268‐8770 ∙ 904‐278‐0524 ∙ sqeinfo@sqe.com ∙ www.sqe.com
2. Ed Weller
Integrated Productivity Solutions, LLC
Ed Weller is the principal in Integrated Productivity Solutions, providing solutions to companies
seeking to improve productivity. Ed is internationally recognized as an expert in software
engineering and in particular software measurement. His focus on quality started with his work
on the Apollo program with General Electric; was reinforced during his work as a hardware, test,
software, and systems engineer and manager on mainframe systems at Honeywell and Groupe
Bull; and continued as the process group manager on the Motorola Iridium Project. For the past
fourteen years, Ed has worked to improve the quality and productivity in small to very large
companies that develop engineering and IT applications.
4. Software Measureme
ent
Demographics
How many of you
Have a recognizable Life Cycle Process
• Agile/SCRUM?
• Waterfall or other life-cycle?
• Don’t know?
Are project managers/SCRUM Masters?
Are testers?
Are developers?
Are managers (one or more levels above project
managers)?
Measurement analysts?
3
Software Measureme
ent
We All know How to Count
We learned to count before starting school
We learned to multiply/divide in the 3rd or 4th grade
So arithmetic isn’t the p oble !
a t et c s t t e problem!
It is knowing why, what and how to measure, and then
knowing what to do with the results
4
5. Software Measurement
e
WHY DO WE MEASURE?
5
Software Measureme
ent
What’s Our Target?
All too often the end is measurement itself
“Measurement is good”
“We gotta measure something”
We
easu e so et g
“Go forth and measure!”
6
6. Software Measureme
ent
Measurement Is an Input to Decision Making
Regardless of what we build, how we build it or who
builds it, someone somewhere is making decisions
Should we invest in product A or B?
Should we invest in company A or B?
Should we ship this product?
Should we cancel this project?
Do we have problems needing corrective action?
Will we have problems that need preventive
action?
ti ?
Today’s measurement is used in tomorrow’s estimates
An investment in future decision making
7
Software Measureme
ent
Information For Decision Making
Informed decision making requires
Understanding what’s important for success
Relating what’s important to indicators that
• Identify significant deviations
• Tell us we are on track
• Predict we will stay on track
Indicators are based on what we measure
Measurement needs to be
• Reasonably accurate
• Consistent measurement
• Clear definitions
• Worth the cost
• Seen as valuable or useful by data providers
8
7. Software Measureme
ent
Measurement Allows Evaluation and Decisions
Subjective
Feels good
Looks right
Fun to use
Objective
Return on Investment
Fitness for use
• Performance
• Reliability
• Usability
Comparison, evaluation, tracking
9
Software Measureme
ent
We Measure to Enable Correct Decisions
Personal
Where to take vacations
What brand of ____ to buy
What airline to fly
Business
Which product to build
Staffing levels
Schedules
Project status
Product release
• Quality
• Time to Market
10
8. Software Measureme
ent
Subjective Measurement?
Grand Canyon National Park
11
Software Measureme
ent
Subjective Measurement
“Just a big hole in the ground” *
“ I wanted my father to see this” (overseas tourist)
Op o s atte w e
Opinions matter when value is subject ve
s subjective
* Context sensitive - could be objective if stated in
cubic kilometers
12
9. Software Measureme
ent
Objective Measurement
Facts matter when value is objective
What product should we invest in?
How much will it cost?
When will it be ready?
Will it satisfy our customers (quality aspects)?
•
•
•
•
•
•
Functionality
Reliability
Performance
Security
Cost
Etc.
13
Software Measureme
ent
When Objective Measures Are Not Available
Opinions and loud voices are the basis for decisions
What’s your opinion worth?
Co pa e that
Compare t at to op o s held by you boss,
opinions eld
your
grandboss, or great-grandboss
Who wins?
In the absence of data, Managers only have opinions,
experience, intuition and “gut-feel” a basis for
decisions
Data is welcomed (most of the time)
Data will trump opinions (most of the time)
14
10. Software Measureme
ent
Business Imperatives
Businesses need to be profitable to survive in the long
run
Cost to build the product includes
•
•
•
•
Effort (developers, testers, managers, support, etc)
Development environment
Test Environment
Others?
Must deliver to meet market demands
• How long will it take/When will we be finished?
• With sufficient functionality to create demand
• With sufficient quality to (at least) satisfy users
What will customers pay for the product?
15
Software Measureme
ent
Business Imperatives and Decisions
Decisions are made using a range of estimating inputs
Guesswork and intuition
Experience w t s la p oducts o se v ces
pe e ce with similar products or services
Data from similar products or services
Have you ever faced this bargaining method:
“If we cannot deliver by xxx, we will go out of
business”
“If you cannot do this project with this budget
by thi d t
b this date I will fi d someone who can”
ill find
h
”
16
11. Software Measureme
ent
When Opinion Trumps Data
A tale of two companies
Company 1 – Owned a market niche, but was
facing new entrants
• Marketing demanded 6 month schedules in the face of
one year estimates from development
• 6 months in, faced with reality, project was cancelled
• Repeat the above two steps for 18 months
• No new product delivery in 18 months, lost 50% of the
market
Company 2 – made customer commitments
without regard to development estimates
• Similar cycle to above, division was eventually closed
down
17
Software Measureme
ent
How Can Measurement Help?
Historical data sets the bounds of reality
When reality and desires do not match, something has
to give
Less functionality (prioritized functionality)
More time
Less waste
More effective and efficient development and
test methods
18
12. Software Measurement
e
ELEMENTS OF METRICS
19
Software Measureme
ent
Metrics
Base Measures = what we can count
Derived Measures = Relationship between two base
measures
Indicators = Base or Derived Measures that tell us
something (useful)
How do we drill down from business objectives to
indicators that identify the measures?
20
13. Software Measureme
ent
Drilling Down From Business Imperatives/Objectives(1)
What are the elements of cost?
People cost = effort * rate (sometimes just
person hours - rate is not used)
$$ cost for development and test environment
$$ cost for COTS or custom software
Overhead costs (vacation, sick leave, training)
What are the elements of value?
Volume and sale price (product)
Contribution to business (internal IT
Infrastructure) or cost of lost business
21
Software Measureme
ent
Drilling Down From Business Imperatives/Objectives(2)
What are the elements of time/schedule?
Elapsed time
Sc edule variation
Schedule va at o
What are the elements of Quality?
Defects (pre-ship)
• Functional – easy to quantify
• Non-functional – Hard(er) to quantify as judgment is
sometimes subjective
Failures (post ship)
Customer surveys
• Level of subjective/objective evaluation varies
22
14. Software Measureme
ent
Providers and Users (1)
Base Measures are typically provided by the bottom of
the pyramid
Users are distributed across the levels
Exec
Feedback is critical
Management
Data
Project
Providers
23
Software Measureme
ent
Providers and Users (2)
What happens when the users forget to tell the
providers how the data is used
“Collecting all that data is a waste of time”
“You can’t use that data for planning, we made
it up”
Measurement becomes a standing joke at the
provider level
Random number generator provides data
Data providers must see the value of time spent
collecting and reporting the data
24
15. Software Measurement
e
MEASURING COST
25
Software Measureme
ent
Why Track Cost?
To know what we have spent on a project
To know what is left of the budget
To know (estimate) whether o not we w ll finish
o ow (est ate) w et e or ot
will
s
within budget
Do we need to add resources?
Should we cancel the project?
To provide a basis for estimating future projects
Funding person or organization has the right to
know if th are making a sound i
k
they
ki
d investment
t
t
If you cannot estimate, how can you make
decisions?
26
16. Software Measureme
ent
Components of Cost
Effort in person hours/days/months
Usually the primary cost element
Functional organizations complicate logging
• Multitasking amongst multiple projects
• Inaccurate logging
Simplified in Agile/SCRUM
• Team size * Length of sprint
• Minus training, non-project activities, vacation, etc.
Most (?) companies track project cost – the minimum
needed for financial accounting
But what is the effort spent on productive tasks?
27
Software Measureme
ent
Development vs Rework
Why do we need to track rework?
Cost of poor quality often/usually exceeds 50%
of the total project or organization budget
• If you do not know what your ratio is, it is virtually
certain rework is >50% of the total
• Cost of poor quality = effort spent on rework
Rework is waste
28
17. Software Measureme
ent
Where’s the Money Going?
Rework is waste
Budget
Development
Defect Rework
Need to differentiate development costs and
rework costs
29
Software Measureme
ent
Rework = SCRAP
I started in hardware development
Defects resulted in scrap
Sc ap
Scrap was w tte o o inventory
written off of ve to y
Inventory was counted by finance
We paid attention to rework costs
= LOSS
30
18. Software Measureme
ent
Software Scrap = ?
How many of you measure your “software scrap”?
How do you define it?
How do you measure it?
What do you do with it?
Rework definition
Effort spent redoing something that should have
worked
• Developer effort to fix defects found in reviews, test or
production
• Test effort to retest and regression test fixes
So what is a defect?
31
Software Measureme
ent
Identifying Defects and Rework Effort (1)
If there are formal test plans and activities, a defect is
nonconformance to specification found in reviews or
test
All effort spent on identifying fixing and
identifying, fixing,
retesting is rework
No formal test plans or activities
Total project effort spent in testing activities
(estimate by headcount and months in test)
Subtract effort to complete one pass of all tests
(cost of conformance)
• This cost is usually less than 10% of the total in the
absence of accurate data collection
• If you do not know this number, ignore it as the total
cost is close enough
32
19. Software Measureme
ent
Identifying Defects and Rework Effort (2)
Agile/SCRUM development
Lots or disagreement on what is a defect
• In Test Driven Development (TDD) tests may be run
before functionality is complete; test failures are not
defects
• However, if the functionality was “done”, test failures
should/could be classified as defects
Defects within a sprint will take care of
themselves – no need to track separately
• A high defect rate requiring rework will lower velocity
Defects found later will result in user stories in a
future sprint
• These need to be tagged as “rework points”
33
Software Measureme
ent
Tracking Agile Rework (1)
Rework points
If the defect pushes completion to the next
sprint, velocity in the current sprint is reduced –
“self correcting”
If system test or production defects are
converted to story cards and points in future
iterations, track these points as rework that
lowers the “net velocity”
p
gg
Defects found outside the sprint suggests more defects
were found and fixed inside the sprint (Inverting
“buggy products continue to be buggy” to “If it is
buggy now, it was buggier earlier”)
34
20. Software Measureme
ent
Tracking Agile Rework (2)
If your velocity looks something like this:
Velocity
25
20
15
Velocity
10
5
0
1
2
3
4
5
6
7
8
9
You could have a rework problem
35
For the velocity shown, 13% of the velocity is due to
rework (red)
If you do not measure this, you are losing productivity
and don’t know it (green = net velocity)
Velocity
25
Net Velocity
25
20
20
Points
15
Points
Software Measureme
ent
Impact of Rework
10
5
15
10
5
0
0
1
2
3
4
5
6
7
8
9
36
1
2
3
4
5
6
7
8
9
21. Software Measureme
ent
Points and Effort (1)
How should we measure and compare points across
Agile teams (or teams regardless of methodology)?
Point “effort” between teams working in
different domains, products or languages will be
different
Trying to make “points” between teams “equal”
would jeopardize team estimating
• Consistent team velocity and size (point) estimating is
critical to team success
Move any normalization outside the team
• Effort/Velocity by team will be more useful than forcing
a measure across multiple teams
• Do not assign a “goodness” rating to velocity
37
Software Measureme
ent
Points and Effort (2)
How do you normalize?
No easy solution
Different tec olog es, co ple ty, etc
e e t technologies, complexity,
“Traditional approaches”
• Function points
• Lines of code (only for identical languages and similar
work)
• Product value
Not a pure numbers comparison
p
p
=
• If A > B, we have to evaluate what that really means
• Do not assume A is better than B
• Use differences to stimulate thinking about why
there are differences
38
?
22. Software Measureme
ent
Points and Effort (3)
The real goal is to maximize productivity of the team
Upward trend in points until it consistently
achieves a similar value
• Minimal rework
• Retrospectives focus on efficiency (or lack thereof)
• Product owner not available
• Manual test vs. automated test
• Annual budget commitment delays
• Multi-tasking
39
Software Measureme
ent
Free Time Is Not Free (E.G., Overtime)
“Free Time” is unpaid/casual work over 40 hrs/week
Use of unpaid overtime has personnel impact we
understand (but often ignore)
Business impact is rarely evaluated or understood
Someone, somewhere is deciding where to allocate
resources for competing projects
Wrong decisions can be due to
Inaccurate estimating
Willful d
Willf l underestimating depending on “f
ti ti d
di
“free
time”
Let’s look at two examples
40
23. Software Measureme
ent
Tale of Two Projects (1)
Same net return, same initial estimate, but one
project uses 50% additional “free time”
250
200
150
1
100
2
50
0
Estimate Free Time
Actual
Value
Est "ROI"
Actual
"ROI"
41
Software Measureme
ent
Tale of Two Projects (3)
Same return, one project hides free time or
underestimates by 50%
Project 2 looks better, but another project might be
better than both
160
140
120
100
1
80
2
60
40
20
0
Estimate
Free Time
Actual Cost
42
Value
Est "ROI"
Actual "ROI"
24. Software Measureme
ent
No Free Time
Whether or not free time is counted, it is a resource
used by projects
When it is ignored, poor estimating leads to poor
decision making
True effort cost of the project is hidden
Other opportunities with better returns are not
chosen
43
Software Measurement
e
MEASURING SCHEDULE
44
25. Software Measureme
ent
Easy to Measure, Hard to Get “Right”
Of cost, schedule and quality, schedule is the easiest
to measure
If “Right” means delivering on the date set at the
project start, many forces conspire to make it hard
Market forces
• Annual dates
• Competition
• Regulatory agencies
Poor product p
p
planning
g
• Catch up with product features and applications
• No control over customer requests by marketing –
everything is “#1” priority
45
Software Measureme
ent
Schedule Measures
Days, weeks or months ahead of/behind schedule
“What is the probability of finishing late”
• Project managers can answer this
“What is the probability of finishing early”
• “What do you mean, finish early? This is a best case
schedule” *
Can be combined with effort measures – Earned Value
“Schedule Performance Index” (SPI)
For effort spent and tasks completed, where are
p
p
,
we with respect to schedule expressed as a
value relative to 1 (<1 - behind, <1 - ahead)
*From “Controlling Software Projects” by Tom DeMarco
46
26. Software Measureme
ent
Critical Path Schedule Measures
Single dimensional view of progress – only looks at
tasks on the critical path
Ignores tasks not on the critical past
Often used with “Line of Balance” charts to hide
problems
Following slide is a simple representation of
tasks, with the critical path “on schedule” today
Conveniently ignores the impact of the two tasks
that are behind
Usually get some mumbo-jumbo about the green
offsetting the yellow
47
Software Measureme
ent
Gantt Chart Critical Path Fakery
JAN
FEB
MAR
APR
MAY
JUN
JUL
AUG
SEP
OCT
NOV
DEC
Today
Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
Critical Path
48
Task not complete
Task Early
27. Software Measureme
ent
Use Both Critical Path and SPI
Focus on critical path tasks
Do not overlook non-critical task backlog that could
end up on the critical path
Any task slipping far enough will end up on the critical
path
49
Software Measurement
e
MEASURING DEFECTS
50
28. Software Measureme
ent
Why Is It So Difficult to Use Defect Data?
Nearly everyone has a defect tracking tool of some
kind
How can we use the data effectively to understand,
plan and control their work
Almost no one has any idea of the number of defects
found and fixed in developer testing
Why? Two reasons:
51
Software Measureme
ent
What Do Defects Tell Us?
A measure of quality
Not perfect – relationship to production failures
is not obvious; inconsistent measurement
• Disregarding severity when counting defects
• Counted only in some testing activities
• Unit testing numbers are rarely counted
• Integration testing defects may not be counted
• Production defect tracking often ends shortly (2-3
months) after delivery
• S ll defects can cost a lot (can anyone top $1 4B ?)
Small d f t
t l t(
t $1.4B
• Most defects lie dormant for a long time *
But there is a gold mine of information when
used correctly
*Edward Adams, Optimizing Preventive Service of Software Products, IBM Systems Journal, Jan 1984
52
29. Software Measureme
ent
Using Historical Defect Data
What’s the choice?
Trailing indicators
• Production failures and high defect rates
• Loss of business
• Inefficient internal IT operations
Leading indicators
• Development and test defect data
• Defect removal effectiveness
• Defect injection rates
• Complexity
• Unit Test coverage
• Etc.
53
Software Measureme
ent
Defect Removal Effectiveness (1)
Use historical system test to production defect ratio to
predict the future
Need to track multiple releases
Need to track for the life of the release – 2-3
months will not suffice
System test effectiveness is typically near or below
50% (if you measure the lifetime of the release)*
Variability makes this measure unreliable
Dependent on uncontrolled development and
early test activities
• Unit test variability from 20% to 80% - 4:1 more (or
fewer) defects into System Test
* See any of Capers Jones books on software quality
54
30. Software Measureme
ent
Defect Removal Effectiveness (2)
How do we extend to all testing and review activities?
Remember?
Need to address both issues to get good data
Lost cause in “punishment centric” organizations
55
Software Measureme
ent
Defect Removal Effectiveness (3)
When measurement turns sour
Early inspection data showed 15:1 differences in
defect detection
•
•
•
•
Equally difficult work
Equally proficient development teams
Confidential interviews identified the “cause”
Six months later we identified the real cause
When measurement is done right
Collecting unit test data
g
• Customer focus
• Anonymous reporting
• Demonstrated “no harm” environment
56
31. Software Measureme
ent
Defect Removal Effectiveness (4)
Defect data from Inspections (formal peer reviews)
Historic defect removal effectiveness (DRE)
related to individual preparation rate
Defect injection rate per unit of size
=
(lines of code, function points, etc.)
Useful leading indicator to predict remaining
defects
DRE = defects found divided by (defects found +
defects found later)
• If the last 4 releases removed 50% of the defects in
system test, then in the next release we can estimate
the number to be found in production will equal those
found in system test (better than guessing)
57
Software Measureme
ent
Defect Clustering
So much to do, so little time
Defect history can identify the defect prone parts of
your software
Use this to focus defect removal effort via
inspections and test
But testing isn’t a bug hunt, so use
appropriately!
Planning impact
More defects can mean longer test cycles
Match team skills to problem areas
58
32. Software Measurement
e
LEADING AND TRAILING INDICATORS
59
Software Measureme
ent
Trailing Indicators
After the fact indication things “Did not go well”
Corrective action to fix
Cost to fix is usually high
s
g
Sometimes it is too late to fix
Examples
•
•
•
•
Lost customer
Product development cancelled
Poor estimation discovered “late”
High defect discoveries in system test
60
33. Software Measureme
ent
Leading Indicators
Before the fact indication that things will not go well
Preventive action to recover or prevent
significant deviations
Usually costs less than corrective action
Examples
• Trend data showing early and consistent slips in effort
applied or tasks completed
• Higher or lower defect detection in inspections
• Backlog growth (or slow reduction)
61
Software Measureme
ent
Leading Indicators in Agile
How can you predict quality as measured by test or
production defects?
Review data is missing
Sketchy unit test data
First measured defect data may be in release
testing
Product defect injection is largely a function of how
well pairing works
How do you measure pairing for leading
indicators?
???
62
34. Software Measureme
ent
Why Don’t We Listen?
Leading indicators are often ignored – why?
Already up to our neck in alligators, new and
future problems are not welcomed
Good trend analysts are often viewed as
doomsayers or “not team players”
Prediction is a lot easier after the fact
• No matter how often you are right with predictions, one
failure and you are busted
63
Software Measurement
e
SUMMARY AND CLOSING REMARKS
64
35. Software Measureme
ent
Key Points
Measurement must meet the business needs of the
organization
Project managers, support groups, line
managers, executive management
Measurement needs to be simple, unambiguous, and
used
Culture will trump reason – it can be a tough sell
Never assume – investigate both “good” and “bad”
analysis to avoid shooting yourself in the foot
65
Software Measureme
ent
Implementation Tips (1)
Keep the collection overhead minimal
Units of measure must be well defined and
understood - ambiguous or confusing definitions
frustrate the providers
ALWAYS provide a “None of the above” or
“Other” selection
• If it isn’t clear, you can get anything as an answer, often
the first or last selection in a list
• Sending a message that accuracy isn’t as important as
filling the form
filli th f
66
36. Software Measureme
ent
Implementation Tips (2)
Do not ask providers to do what is properly the work of
the metrics analysts
If they use the analysis results of their data – it
can be their job if
• Analysis is straightforward and quick
• A tool supports the analysis
If the data is used in project reviews, it may be
the project manager’s job – see the 2 sub-bullets
above
Anything else is usually best done by the
measurement specialists (until the analysis is
automated
67
Software Measureme
ent
The Corporate Metrics Mandate
“Measurement is good, therefore you shall measure
(something)”
One size fits all mandate
• You shall collect and report on xxx
• Reporting is more important than what is measured
Measurement becomes a standing joke
• “I need to create the monthly metrics report”
• “Since we are measuring, everything must be OK”
68
37. Software Measureme
ent
Usage Tips
FEEDBACK – FEEDBACK – FEEDBACK
Be sure the providers see
• Decisions based on the data they provide
• How the data they provide helps the organization
become more efficient and effective
Never punish a person based solely on the data they
provide, whether perception or reality
Guarantees future data will be flawed
Makes any measurement very difficult
y
y
69
Software Measureme
ent
Parting Thoughts
Lord Kevin
“I often say that when you can measure what you are
speaking about, and express it in numbers, you know
something about it; but when you cannot measure it
it,
when you cannot express it in numbers, your knowledge is
of a meagre and unsatisfactory kind; it may be the
beginning of knowledge, but you have scarcely in your
thoughts advanced to the state of Science, whatever the
matter may be.” [PLA, vol. 1, "Electrical Units of
Measurement", 1883-05-03]
“There is nothing new to be discovered in physics now, all
g
p y
,
that remains is more and more precise measurement.”
Ed Weller
“Think about what your measurements mean”
70
38. Software Measureme
ent
Contact Information
Ed Weller
Integrated Productivity Solutions
Ed.weller@integratedproductivitysolutions.com
Or if you type like me ☺
efweller@aol.com
71
Software Measureme
ent
Defect Depletion
With assumed or actual defect injection and removal
rates, it is possible to predict residual defects
Useful for “what-if” evaluations
Demonstrated the relative cost of removing
defects
Alternatively, historical defect removal effectiveness
can be used to predict residual defects
For relatively stable inspection and test
processes,
processes these numbers do not change
significantly
72
39. Software Measureme
ent
Using Removal Effectiveness
If historical data shows that 60% of defects are
removed prior to the start of test, use this number as
a predictor
Required defects found in reviews to be
equivalent to defects discovered in test or use
Cannot count spelling, grammar, or
maintainability defects
If inspections find 540 defects, then the total defects
are 540/.60 = 900, so the residual defects total 360
Modest checking of the inspection process is
required
• Individual preparation rates and coverage
• Team member selection
73
Software Measureme
ent
Defect Depletion “What-if” Analysis
Requires historical data for defects injected and
removed in each activity or phase
Cost data for defect identification and repair in each
stage
See “Managing the Software Process” by Watts
Humphrey for a full discussion of this technique in
Chapter 16
74