A case study trying to answering the question "Are there statistical correlations between statement coverage and the number of failures detected?" and running a comparison between different reliability growth models
3. PROJECT SELECTION
• Selected JFreeChart as open source project.
• JFreeChart is an Open Source Java chart
library that makes easy for developers to
display professional quality charts in their
applications
• This project was chosen for high flexibility in
maintainability and testing as mentioned in
several blogs. In particular Testability Explorer
( http://www.testabilityexplorer.org/report )
allowed us to find out some useful metrics.
• Among the different modules we chose org.jfree.data. This is a base package for classes that
represent different types of data.
• The size of the module is 3167 LOCs and was computed using an automatic tool
5. Pt1: Coverage
MOTIVATION
“85% is a common number. People seem to pick it because that's the
number other respectable companies use.
I once asked someone from one of those other respectable companies
why they used 85%. He said, "When our division started using
coverage, we needed a number.Division X has a good reputation, so
we thought we'd use the number they use."
I didn't follow the trail back through Division X.
I have the horrible feeling that, if I traced it all the way back to the
Dawn of Time, I'd find someone who pulled 85% out of a hat ...”
Brian Marick - 1999
Read the original article on: http://www.exampler.com/testing-com/writings/coverage.pdf
6. Pt1: Coverage
RESEARCH QUESTION
Are there statistical correlations between
statement coverage and the number of
failures detected?
Goal:
investigate if there is a relation between the number of total
failures in according to the percentage of coverage.
Purpose:
evaluate the influence of coverage over the software project.
Quality focus:
number of failures .
Perspective:
project manager willing to better understand the evolution of
failure reports.
7. Pt1: Coverage
METHODS.1
Hypothesis Formulation
H0 : Null Hypothesis
There is no difference in the number of failure detected in according to
the percentage of test coverage
Ha : Alternative Hypothesis
There is difference in the number of failure detected in according to
the percentage of test coverage
Variable selection
Independent variable:
the main factor of the experiment is the
coverage percentage of testing (from 85.1 to 0)
Dependent variable:
number of failure detected
8. Pt1: Coverage
85.1%
METHODS.2
Code coverage
after our
modifications
Sub-test suites built to cover different
percentage of statements
9. Pt1: Coverage
RESULTS.1
• The correlation shows a strong correspondence between the coverage percentage and total number of failures.
The null hypothesis can be rejected since the p-value is almost zero with correlation coefficient of 0.93 by 95% of
confidence interval which runs from 0.83 to 0.97. As we will see in the regression analysis too, when we increase
the percentage of the coverage in the code, almost linearly the number of the failures detected increases
proportionally.
10. Pt1: Coverage
RESULTS.2
Regression line - Coefficients Non constant error variance
The plot shows how the linear regression fit better
Slope: ~ 0.60 in the low coverage percentage then high coverage
R2: ~ 0.85 percentage.
12. Pt1: Coverage
RESULTS.4
Number of Unit Tests vs. Failures Detected
“... if we increase the number of unit test,the number of detected failures increases of less then one half “
13. Pt1: Coverage
RESULTS.5
Number of Tests vs. Statement Coverage
logarithmic shape
15. Pt2. Reliability
MOTIVATION.1
Analyzing just one release we can focus upon a given time interval of the whole life to understand how the
number of defects behaves from an over testing time.
16. Pt2. Reliability
RESEARCH QUESTION
What is the reliability behaviour of the
selected software project?
Goal:
predict the reliability in according to models.
Purpose:
evaluate the reliability of the project over time.
Quality focus:
number of failures .
Perspective:
project manager willing to better understand the evolution of reliability
17. Pt2. Reliability
METHODS.1
Seeding Faults
•23 Faults seeded
•7.23 Fault/kLOC •Proportionally distributed among classes
•Different types of Fault •Random positioning inside classes
•Easy to enable/disable faults
18. Pt2. Reliability
METHODS.2
Failure detection
•Step1 : we created two java classes to help us in automating the work
•Step2 : inserted at the beginning of the JUnit test suite ( with 85.1%
coverage ) the instruction to start the timer : TestController.start();
•Step3 : we started executing tests with all faults activated, and /** XXX FAILURE DETECTED */
progressively decrease the number of faults for each version. FailureController.intercept(“failure description”) ;
•Step4 : reported times of failure occurrence on a table
•Step5 : run statistical test analysis with R
19. RESULTS.1
•Some faults are never executed during tests
•Not every faults executed caused a failure of the system (6 out of 22)
20. Pt2. Reliability
RESULTS.2
Comparison among models
Among different models we chose Weibull more S-shaped
model since it obtained the best R2 coefficient.
The parameter values computed are:
• a: 40
• b: 0.146
• c: 0.281
22. CONCLUSIONS
•We believe that the right amount of code coverage is strictly dependent on
the project type
•According to the prediction made with the model used in the case study,
the number of failures after 5000 ms ( the double of the final time ) increases
of just the ~18%
•In the case of JFreeChart, that is a non-critical system, this means that the
product is almost ready to be shipped
•This analysis help us to plan the appropriate levels of support required for
defect correction after the software is released.
•With our tool is easy to make comparison among different models.
23. Pt2. Reliability
JModel
•A tool we developed in order to help engineers to analyze data
•Produces a final report that shows how well the selected model fits real data,
•Computes model´s parameters and allows the user to make predictions.
•Integrates different technologies ( Java,HTML +css+js,R, bash scripting )
DEMO
24. !
THANK YOU
!
Time for questions
paternoster.nicolo@gmail.com
carmine.giardino@gmail.com
Notas del editor
\n
\n
\n
\n
\n
Whether the relationship is positive or negative\nThe strength of relationship\n
to reveal statistically significant difference between the percentage of test coverage and number of faults we should reject the null hypothesis\n
Improved testing coverage up to 85.1% to provide much data to analyse\n
\n
\n
\n
\n
\n
Critical aspect in developing software is reliability. In oder to attempt to provide information about it software models should be analysed.\n
Focusing on one release we can understand the residual defects to provide information about if the software is reliable before it is shipped to costumer, and how much support should be arrange for with relative costs.\n
\n
\n
\n
\n
The theory for this model is described by finite type categorisation, where the finite amount of code should have a finite number of defects.\nMoreover they assume that the defect detection rate is proportional to the number of defects in the code.\nTwo main representations: concave, s-shaped.\n
Concave model where the we predicted the number of failure in 5000 ms will be 19 from 16 detected.\n