Más contenido relacionado
Similar a Bad Testing Metrics—and What To Do About Them (20)
Bad Testing Metrics—and What To Do About Them
- 1. Bad Metrics and What You
Can Do About It
Paul Holland
Test Consultant and Teacher
at Testing Thoughts
- 2. My Background
• Independent S/W Testing consultant since Apr 2012
• 16+ years testing telecommunications equipment
and reworking test methodologies at Alcatel-Lucent
• 10+ years as a test manager
• Presenter at STAREast and CAST
• Keynote at KWSQA conference in 2012
• Facilitator at 25+ peer conferences and workshops
• Teacher of S/W testing for the past 5 years
• Teacher of Rapid Software Testing
– through Satisfice (James Bach): www.satisfice.com
• Military Helicopter pilot – Canadian Sea Kings
April, 2013
©2013 Testing Thoughts
2
- 3. Attributions
• Over the past 10 years I have spoken with many
people regarding metrics. I cannot directly
attribute any specific aspects of this talk to any
individual but all of these people (and more)
have influenced my opinions and thoughts on
metrics:
– Cem Kaner, James Bach, Michael Bolton, Ross
Collard, Doug Hoffman, Scott Barber, John
Hazel, Eric Proegler, Dan Downing, Greg
McNelly, Ben Yaroch
April, 2013
©2013 Testing Thoughts
3
- 4. Definitions of METRIC
(from http://www.merriam-webster.com, April 2012)
• 1 plural : a part of prosody that deals with metrical structure
• 2 : a standard of measurement <no metric exists that can
be applied directly to happiness — Scientific Monthly>
• 3 : a mathematical function that associates a real
nonnegative number analogous to distance with each pair of
elements in a set such that the number is zero only if the
two elements are identical, the number is the same
regardless of the order in which the two elements are
taken, and the number associated with one pair of elements
plus that associated with one member of the pair and a third
element is equal to or greater than the number associated
with the other member of the pair and the third element
April, 2013
©2013 Testing Thoughts
4
- 5. Sample Metrics
• Number of Test Cases Planned (per release or
feature)
• Number of Test Cases Executed vs. Plan
• Number of Bugs Found per Tester
• Number of Bugs Found per Feature
• Number of Bugs Found in the Field
• Number of Open Bugs
• Lab Equipment Usage
April, 2013
©2013 Testing Thoughts
5
- 6. Sample Metrics
• Hours between crashes in the Field
• Percentage Behind Plan
• Percentage of Automated Test Cases
• Percentage of Tests Passed vs. Failed (pass rate)
• Number of Test Steps
• Code Coverage / Path Coverage
• Requirements Coverage
April, 2013
©2013 Testing Thoughts
6
- 7. Goodhart‘s Law
• In 1975, Charles Goodhart, a former advisor to
the Bank of England and Emeritus Professor at
the London School of Economics stated:
Any observed statistical regularity will
tend to collapse once pressure is
placed upon it for control purposes
Goodhart, C.A.E. (1975a) ‗Problems of Monetary Management: The UK Experience‘ in
Papers in Monetary Economics, Volume I, Reserve Bank of Australia, 1975
April, 2013
©2013 Testing Thoughts
7
- 8. Goodhart‘s Law
• Professor Marilyn Strathern FBA has re-stated
Goodhart's Law more succinctly and more
generally:
`When a measure becomes a
target, it ceases to be a good
measure.'
April, 2013
©2013 Testing Thoughts
8
- 9. Elements of Bad Metrics
• Measure and/or compare elements that are
inconsistent in size or composition
– Impossible to effectively use for comparison
– How many containers do you need for your
possessions?
– Test Cases and Test Steps
• Greatly vary in time required and complexity
– Bugs
• Can be different severity, likelihood - i.e.: risk
April, 2013
©2013 Testing Thoughts
9
- 10. Elements of Bad Metrics
• Create competition between individuals
and/or teams
– They typically do not result in friendly competition
– Inhibits sharing of information and teamwork
– Especially damaging if compensation is impacted
– Number of xxxx per tester
– Number of xxxx per feature
April, 2013
©2013 Testing Thoughts
10
- 11. Elements of Bad Metrics
• Easy to ―game‖ or circumvent the desired
intention
– Easy to be improved by undesirable behaviour
– Pass rate (percentage): Execute more simple
tests that will pass or break up a long test case
into many smaller ones
– Number of bugs raised: Raising two similar bug
reports instead of combining them
April, 2013
©2013 Testing Thoughts
11
- 12. Elements of Bad Metrics
• Contain misleading information or gives a
false sense of completeness
– Summarizing a large amount of information
into one or two numbers out of context
– Coverage (Code, Path)
• Misleading information based on touching the code
once
– Pass rate and number of test cases
April, 2013
©2013 Testing Thoughts
12
- 13. Impact of Using Bad
Metrics
– Promotes bad behaviour:
• Testers may create more smaller test cases instead of
creating test cases that make sense
• Execution of ineffective testing to meet requirements
• Artificially creating higher numbers instead of doing
what makes sense
• Creation of tools that will mask inefficiencies (e.g.: lab
equipment usage)
• Time wasted improving the ―numbers‖ instead of
improving the testing
April, 2013
©2013 Testing Thoughts
13
- 14. Impact of Using Bad
Metrics
• Gives Executives a false sense of test coverage
– All they see is numbers out of context
– The larger the numbers the better the testing
– The difficulty of good testing is hidden by large ―fake‖
numbers
• Dangerous message to Executives
– Our pass rate is at 96% so our product is in good
shape
– Code coverage is at 100% - our code is completely
tested
– Feature specification coverage is at 100% - Ship it!!!
• What could possibly go wrong?
April, 2013
©2013 Testing Thoughts
14
- 15. Sample Metrics
• Number of Test Cases Planned (per release or
feature)
• Number of Test Cases Executed vs. Plan
• Number of Bugs Found per Tester
• Number of Bugs Found per Feature
• Number of Bugs Found in the Field – A list of Bugs
• Number of Open Bugs – A list of Open Bugs
• Lab Equipment Usage
April, 2013
©2013 Testing Thoughts
15
- 16. Sample Metrics
• Hours between crashes in the Field
• Percentage Behind Plan – depends if plan is flexible
• Percentage of Automated Test Cases
• Percentage of Tests Passed vs. Failed (pass rate)
• Number of Test Steps
• Code Coverage / Path Coverage – depends on usage
• Requirements Coverage – depends on usage
April, 2013
©2013 Testing Thoughts
16
- 17. So … Now what?
• I have to stop counting everything. I feel
naked and exposed.
• Track expected effort instead of tracking
test cases using:
– Whiteboard
– Excel spreadsheet
April, 2013
©2013 Testing Thoughts
17
- 18. Whiteboard
• Used for planning and tracking of test
execution
• Suitable for use in waterfall or agile (as long
as you have control over your own team‘s
process)
• Use colours to track:
– Features, or
– Main Areas, or
– Test styles (performance, robustness, system)
April, 2013
©2013 Testing Thoughts
18
- 19. Whiteboard
• Divide the board into four areas:
–
–
–
–
Work to be done
Work in Progress
Cancelled or Work not being done
Completed work
• Red stickies indicate issues (not just bugs)
• Create a sticky note for each half day of work (or
mark # of half days expected on the sticky note)
• Prioritize stickies daily (or at least twice/wk)
• Finish ―on-time‖ with low priority work incomplete
April, 2013
©2013 Testing Thoughts
19
- 20. Sticky Notes
• All of these items are optional – add your own
elements
Use what makes sense to your situation
–
–
–
–
–
–
Charter Title (or Test Case Title)
Estimated Effort
Feature area
Tester name
Date complete
Effort (# of sessions or half days of work)
• Initially, estimated -> replace with actual
April, 2013
©2013 Testing Thoughts
20
- 23. Reporting
• An Excel Spreadsheet with:
–
–
–
–
–
–
–
–
–
–
List of Charters
Area
Estimated Effort
Expended Effort
Remaining Effort
Tester(s)
Start Date
Completed Date
Issues
Comments
• Does NOT include pass/fail percentage or number of
test cases
April, 2013
©2013 Testing Thoughts
23
- 24. Sample Report
Charter
Area
Estimated Expended Remaining
Effort
Effort
Effort
Tester
Date
Issues
Date Started Completed Found
Comments
Lots of investigation. Problem was on 2-3 out of
ALU01617 48 ports which just happened to be 2 of the 6
01/14/2012 032
ports I tested.
Investigation for high QLN spikes on EVLT
H/W
Performance
0
20
0
acode
ARQ Verification under different RA Modes
ARQ
2
2
0
ncowan 12/14/2011 12/15/2011
POTS interference
ARQ
2
0
0
---
01/08/2012 01/08/2012
Expected throughput testing
ARQ
5
5
0
acode
01/10/2012 01/14/2012
INP vs. SHINE
ARQ
6
6
0
ncowan 12/01/2011 12/04/2011
INP vs. REIN
ARQ
6
INP vs. REIN + SHINE
ARQ
12
Traffic delay and jitter from RTX
ARQ
2
Attainable Throughput
April, 2013
ARQ
1
7
5
jbright
12/10/2011
01/06/2012 01/10/2012
Decided not to test as the H/W team already
tested this functionality and time was tight.
To translate the files properly, had to install
Python solution from Antwerp. Some overhead
to begin testing (installation, config test) but was
fairly quick to execute afterwards
12
2
4
0
0
ncowan 12/05/2011 12/05/2011
jbright
01/05/2012 01/08/2012
©2013 Testing Thoughts
Took longer because was not behaving as
expected and I had to make sure I was testing
correctly. My expectations were wrong based
on virtual noise not being exact.
24
- 25. Sample Report
"Awesome Product" Test Progress as of 02/01/2012
Effort (person half days)
90
80
Original Planned Effort
70
Expended Effort
60
Total Expected Effort
50
40
30
20
10
0
ARQ
SRA
Vectoring
Regression
H/W Performance
Feature
April, 2013
©2013 Testing Thoughts
25