1. <Insert Picture Here>
Code coverage.
The pragmatic approach.
Александр Ильин
Java Quality architect
Oracle
2. What it is about?
Should the testing be stopped at 100% coverage?
Should 100% be the goal?
How (else) to use code coverage information?
What it is not about?
Tools
2
4. What is the code coverage data for
Measure to which extent source code is covered
during testing.
consequently …
Code coverage is
A measure of how much source code is covered
during testing.
finally …
Testing is
A set of activities aimed to prove that the system under
test behaves as expected.
4
5. CC – how to get
• Create a template
Template is a collection of all the code there is to cover
• “Instrument” the source/compiled code/bytecode
Insert instructions for dropping data into a file/network, etc.
• Run testing, collect data
May need to change environment
• Generate report
HTML, DB, etc
5
6. CC – kinds of
• Block / primitive block
• Line
• Condition/branch/predicate
• Entry/exit
• Method
• Path/sequence
6
7. CC – how to use
for testbase improvement
• 1: Measure (prev. slide)
Performed repeatedly, so resource-efficiency is really important
• Perform analysis
Find what code you need to cover.
Find what tests you need to develop.
• Develop more tests
• Find dead code
• GOTO 1
7
9. CC – how not to use
mis-usages
• Must get to 100%
May be not.
• 100% means no more testing
No it does not.
• CC does not mean a thing
It does mean a fair amount if it is used properly.
• There is that tool which would generate tests
for us and we're done
Nope.
9
11. Test generation
“We present a new symbolic execution tool, ####,
capable of automatically generating tests that
achieve high coverage on a diverse set of complex
and environmentally-intensive programs.”
#### tool documentation
13. Test generation cont.
if ( b != 3 ) {
double a = 1 / ( b – 3);
} else {
…
}
Reminder: testing is ...
A set of activities aimed to prove that the system under
test behaves as expected.
14. Test generation - conclusion
Generated tests could not test that the code work
as expected because they only know how the code
works and not how it is expected to. Because the
only thing they possess is the code which may
already be not working as expected. :)
Hence …
Generated tests code coverage should not be
mixed with regular functional tests code coverage.
14
15. Who watches the watchmen?
• Test logic gotta be right
• No way to verify the logic
• No metrics
• No approaches
• No techniques
• Code review – the only way
• Sole responsibility of test developer
21. 100% sequence coverage
(-1,-1)
(-1,1)
(1,-1)
(1,1)
(0,0)
b
1 -1 -1 1 NaN
But … isPositive(float) has a defect!
22. 100% sequence coverage
• Has conceptual problems
• Code semantics
• Loops
• One of the two
• Assume libraries has no errors
• Done in depth – with the libraries
• Very expensive
• A lot of sequences: 2# branches, generally speaking
• Very hard to analyze data
23. 100% coverage - conclusion
100% block/line/branch/path coverage, even if
reachable, does not prove much.
Hence …
No need to try to get there unless ...
23
25. CC target value - cost
Test Dev. Effort by Code Block Coverage.
Industry data
indicates that effort 90.00
increases
exponentially with 80.00
coverage.
Relative Test Dev. Effort (1 at 50% code block coverage)
70.00
We scale to make
effort relative to the
effort of getting 50% 60.00
coverage.
50.00 f x =k e r x
Intuition: the effort
needed to get more k =e−50r ⇒ f 50=1
40.00
coverage is
df
proportional to the =r f x
30.00 dx
total effort needed to
get current coverage.
20.00
Model not reliable
below 50% coverage, 10.00
except maybe very big
projects. 0.00
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Code Block Coverage (%)
26. CC target value - effectiveness
Defect Coverage by Code Block Coverage
Defect coverage by 120.00
code block coverage...
def ned in terms of
i 100.00
effort per code coverage
and defect coverage by
effort. H x =h f x
80.00
f x =k e r x
Defect Coverage(%)
Intuition: discovery rate −
s
y
B
is proportional to the h y = B1−e
percentage of bugs 60.00
remaining and the effort dH H x df
=s 1− x
needed to get current dx B dx
coverage. 40.00
Model not reliable below
50% coverage except 20.00
maybe very big projects.
0.00
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Code Block Coverage(%)
27. CC target value - ROI
Benef t(c) = DC(c) DD COD, where
i
DC(c): Defect Coverage
Cost-Benefit Analysis DD: Defect Density. Example:
50bug/kloc
1200.00 COD: Cost Of Defect. Example:
$20k/bug
1000.00
800.00
Benefit ($/size), Cost ($/size), ROI (%)
ROI = Benef t(c)/ Cost(c) - 1
i
600.00
400.00
200.00
0.00
Cost(c) = F + V * RE(c), where
RE(c): Relative Effort, RE(50%) = 1
F: Fixed cost of test. Example:
-200.00 $50k/kloc
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
V: Variable cost of test. Example:
$5k/kloc
Code Block Coverage (%)
28. 100% coverage - conclusion
100% block/line/branch/path coverage, even if
reachable, does not prove much.
Hence …
No need to try to get there unless …
100% is the target value.
Which could happen if cost of a bug is really big
and/or the product is really small.
28
29. Target value - conclusion
True target value for block/line/branch/path comes
from ROI, which is really hard to calculate and
justify.
29
31. CC – how to use
• Test base improvement.
Right. How to select which tests to develop first
• Dead code.
Barely an artifact
• Metric
Better have a good metric.
• Control over code development
• Deep analysis
31
33. What makes a good metric
Simple to explain
So that you could explain your boss why is that important
to spend resources on
Simple to work towards
So that you know what to do to improve
Has a clear goal
So you could tell how far are you.
34. Is CC a good metric?
Simple to explain +
Is a metric of quality of testing.
Simple to work towards +
(Relatively) easy to map uncovered code to missed tests.
Has a clear goal -
Nope. ROI – too complicated.
Need to filter the CC data
so only that is left which must be covered
35. Public API*
Is a set of program elements suggested for usage by
public documentation.
For example: all functions and variables which are
described in documentation.
For a Java library: all public and protected methods and
fields mentioned in the library javadoc.
For Java SDK: … of all public classes in java and javax
packages.
(*) Only applicable for a library or a SDK
37. True Public API (c)
Is a set of program elements which could be accessed
directly by a library user
Public API
+
all extensions of public API in non-public classes
39. True Public API how to get
• Get public API with interfaces
• Filter template so that it only contains implementations
and extensions of the public API (*)
• Filter the data by template
(*) This assumes that you either
• Use a tool which allows such kind of filtering
or
• Have the data in a parse-able format and develop the
filtering on your own
40. UI coverage
In a way, equivalent to public API but for a UI product
• %% of UI elements shown – display coverage
• %% user actions performed – action coverage
Only “action coverage” could be obtained from CC data (*).
(*) For UI toolkits which the presenter is familiar with.
41. Action coverage – how to get
• Collect CC
• Extract all implementations of
javax.swing.Action.actionPerformed(ActionEvent)
or
javafx.event.EventHandler.handle(Event)
• Inspect all the implementations
org.myorg.NodeAction.actionPerformed(ActionEvent)
• Add to the filter:
org.myorg.NodeAction.nodeActionPerformed(Node myNode)
• Extract, repeat
42. “Controller” code coverage
Model
Contains the domain logic
View
Implements user interaction
Controller
Maps the two. Only contains code which is called as a
result of view actions and model feedbacks.
Controller has very little boilerplate code. A good
candidate for 100% block coverage.
43. “Important” code
• Development/SQE marks class/method as important
• We use an annotation @CriticalForCoverage
• List of methods is obtained which are marked as
important
• We do that by an annotation processor right while main
compilation
• CC data is filtered by the method list
• Goal is 100%
44. Examples of non-generic metrics
• BPEL elements
• JavaFX properties
• A property in JavaFX is something you could set, get and bind
• Insert your own.
45. CC as a metric - conclusion
There are multiple ways to filter CC data to a set of
code which needed to be covered in full.
There are generic metrics and there is a possibility
to introduce product specific metric.
Such metrics are easy to use, although not always
so straightforward to obtain.
45
47. Test prioritization
100500 uncovered lines of code!
“OMG! Where do I start?”
Metric
• Develop tests to close the metric
• Pick another metric
“Metrics for managers. Me no manager! Me write code!”
Consider mapping CC data to few other source code
characteristics.
48. Age of the code
New code is better be tested before getting to customer.
Improves bug escape rate, BTW
Old code is more likely to be tested by users
or
Not used by users.
49. What's a bug escape metric?
Ratio of defects sneaked out unnoticed
# defects not found before release
In theory:
# defects in the product
# defects found after release
Practical:
# defects found after + # defects found before
50. Number of changes
More times a piece of code was changed, more atomic
improvements/bugfixes were implemented in it.
Hence …
Higher risk of introducing a regression.
51. Number of lines changed
More lines changed – more testing it needs.
Better all – number of uncovered lines which were
changed in the last release.
52. Bug density
Assuming all the pieces were tested equally well …
Many bugs means there are, probably, even more
• Hidden behind the known ones
• Fixing existing ones may introduce yet more as regressions
53. Code complexity
Assuming the same engineering talent and the same
technology …
More complex the code is – more bugs likely to be there.
Any complexity metric would work: from class size to
cyclomatic complexity
54. Putting it together
A formula
(1 – cc) * (a1*x1 + a2*x2 + a3*x3 + ...)
Where
cc – code coverage (0 - 1)
xi – a risk of bug discovery in a piece of code
ai – a coefficient
55. Putting it together
(1 – cc) * (a1*x1 + a2*x2 + a3*x3 + ...)
The ones with higher value are first to cover
• Fix the coefficients
• Develop tests
• Collect statistics on bug escape
• Fix the coefficient
• Continue
56. Test prioritization - conclusion
CC alone may not give enough information.
Need to accompany it with other characteristics of
test code to make a decision.
Could use a few of other characteristics
simultaneously.
56
58. Decrease test execution time
Exclude tests which do not add coverage (*).
But, be careful! Remember that CC is not all and even
100% coverage does not mean a lot.
While excluding tests get some orthogonal measurement
as well, such as specification coverage.
(*) Requires “test scales”
59. Deep analysis
Study the coverage report, see what test code exercises
which code. (*).
Recommended for developers.
(*) Also requires “test scales”
60. Controlled code changes
Do not allow commits unless all the new/changed code is
covered.
Requires simultaneous commits of tests and the
changes.
61. Code coverage - conclusion
100% CC does not guarantee that the code is working right
100% CC may not be needed
It is possible to build good metrics with CC
CC helps with prioritization of test development
Other source code characteristics could be used with CC
61
62. Coverage data is not free
• Do just as much as you can consume *
• Requires infrastructure work
• Requires some development
• Requires some analysis
(*) The rule of thumb
63. Coverage data is not free
• Do just as much as you can consume
• Requires infrastructure work
• Requires some development
• Requires some analysis
• Do just a little bit more than you can consume *
• Otherwise how do you know how much you can consume?
(*) The rule of thumb
64. <Insert Picture Here>
Code coverage.
The pragmatic approach.
Александр Ильин
Java Quality architect
Oracle