Defect Prediction Model Using Order Statistics

https://iaeme.com/Home/journal/IJARET 2573 editor@iaeme.com
International Journal of Advanced Research in Engineering and Technology (IJARET)
Volume 11, Issue 11, November 2020, pp. 2573-2583 Article ID: IJARET_11_11_256
Available online at https://iaeme.com/Home/issue/IJARET?Volume=11&Issue=11
ISSN Print: 0976-6480 and ISSN Online: 0976-6499
DOI: https://doi.org/10.17605/OSF.IO/PQXJW
© IAEME Publication Scopus Indexed
DEFECT PREDICTION USING ORDER
STATISTICS
Lavanya Adusumilli
Research Scholar, Dept of CSE, Acharya Nagarjuna University,
Guntur, Andhra Pradesh, India
Dr. R. Satya Prasad
Professor,Dept of CSE, Acharya Nagarjuna University,
Guntur, Andhra Pradesh, India
ABSTRACT
From the past many years many software defects prediction models are developed
to solve the various issues in software project development. Software reliability is the
significant in software quality which evaluates and predicts the quality of the software
based on the defects prediction. Many software companies are trying to improve the
software quality and also trying to reduce the cost of the software development.
Rayleigh model is one of the significant models to analyze the software defects based
on the generated data. Analysis of means (ANOM) is statistical technique which gives
the quality assurance based on the situations. In this paper, an improved software defect
prediction models (ISDPM) are used for predicting defects occur at the time of five
phases such as analysis, planning, design, testing and maintenance. To improve the
performance of the proposed methodology an order statistics is adopted for better
prediction. The experiments are conducted on 2 synthetic projects that are used to
analyze the defects.
Key words: ANOM, Software defect prediction, Order statistics, Rayleigh model.
Cite this Article: Lavanya Adusumilli and R. Satya Prasad, Defect Prediction using
Order Statistics, International Journal of Advanced Research in Engineering and
Technology (IJARET), 11(11), 2020, pp. 2573-2583.
https://iaeme.com/Home/issue/IJARET?Volume=11&Issue=11
1. INTRODUCTION
Software reliability models (SRM) are most widely used to predict the failures of the software
based on the coding. SRM is used to understand the reasons to predict the failures in the
software and try to quantify the software reliability. The other name for SRM is the numerical
model that improves the detection of errors after changing the code [1]. These models mainly
focused on deciding the required efforts that are specified for testing [2]. The developers test
and debug the system until the required levels are reached. For every software product, the

Defect Prediction using Order Statistics
reliability metrics are most widely used [3]. In every software project, various failures occur in
the development phase and also in the execution phase [4]. Calculating the software reliability
is a very dangerous task because of not understanding the type of software. Finding the accurate
reliability model is mostly connected with the reliability models. There is no proper definition
of the estimated software [5]. If the reliability is not measured directly, we can measure the
reliability features.
2. LITERATURE SURVEY
K.S.Kumari et al., [6] proposed the Pareto type II distribution model with an order statistic
approach and applied Statistical process control (SPC) to analyze the failures. The comparison
with HLD with time-domain data based on NHPP. The proposed approach analyzes the failure
data with both models and shows the performance by using control charts. Y. Wu et al., [7]
proposed advanced software testing. The SRAT is a more effective approach that gives more
reliable testing. This approach also tests the programs and operations that are given in the
coding. M. Nafreen et al., [8] proposed the approach that combines the SRGM with software
defect tracking. This SRGM is applied to the NASA project to find the various defects in that
project and analyzes the resolution of the defect. The proposed model initializes the 13 unique
stages of the NASA software defect lifecycle. Results show that the proposed approach
achieved better performance compared with other models. K.K.San et al., [9] proposed the new
SRGM technique by utilizing features that are collected from past projects to estimate the bugs
from the present running projects. From the literature, the training is given to the same projects
for the affective bug detection. For the prediction of new data analysis the RNN based DLSTM
model is developed. C.Guo et al., [10] proposed a novel software reliability growth model to
find the severity levels of the faults that are commonly considered into account. The fault
severity is analyzed with the levels and these are shown with the logistic curve. Y Liu et al.,
[11] proposed a fault removal framework for reliability of software with half grouped datasets
that improve the accuracy of time delay methods. The parameters are estimated with the
Maximum Likelihood estimation method. J.Yang et al., [12] analyzed the failure processes by
testing the multiple software releases that are investigated by considering the delays in fault
repair time based on the time delay model. The proposed approach is applied on two test
datasets and the software is released for three times. M.Cinque et al., [13] proposed the
debugging-workflow-aware software reliability growth method (DWA-SRGM) method that
attacks the bugs in the software project. The proposed model analyzed the project bugs between
17 to 13 months. The proposed approach shows the improved debugging approach for projects.
H.Okamura et al., [14] presented the better framework that finds the correlation between the
fault detection and correction times. This is new fault detection model that process this model
with hyper-Erlang distributions, and develop the model parameter estimation algorithm via EM
(expectation-maximization) algorithm. H.Sukhwani et al., [15] presented the SRGM for the
flight management software which is launched for space mission. All these reports are collected
from flight software development. This works better on real-time software applications.
3. ORDER STATISTICS
The prospect that the given software works properly for a given amount of time, in the specified
environment is known as Software Reliability. The competence of a software product to
conserve a specified level of performance when used under specified conditions is known as
reliability. The software reliability testing may also lead us to some issues in the design and
functionalities of the software. Software Reliability is also a key factor affecting the system
reliability.

Lavanya Adusumilli and R. Satya Prasad
When we want to deal with ordered random variables and their functions, we glance up to
order statistics. We will identify the importance of ordered statistics is seen when inter failure
time is significantly less or when failures happen at a repetitive rate.
Consider a variate x with a probability density function f(x) and cumulative distribution
function F (x). Let x1, x2, …., xn be a random sample of size n representing n inter failure times
of a product governed by the probability model of a continuous random variable x.A subgroup
or a sample is mentioned as a small set of perceptions on a process parameter or as its yield,
considered at a time. The foremost issue in choosing a subgroup is the size and recurrence of
the given test. The variation depends on the size of the subgroup picked. The more modest the
size of the subgroup, the lesser is that the variance sees in it. Bigger the sample size, the more
they are vulnerable in identifying change.
The software failure process is figured by means of failure control outlines dependent on
the cumulation of inter failure data. The failure data are going to be grouped into 4, 5 and are
then cumulated after transformation is completed. The time lapse between each two progressive
disappointments portrays the inter failure time data. If the waiting time for failure isn’t a
significant issue, the inter failure time data are often grouped into non overlapping successive
subgroups of size 4 or 5 and ad the failure time with less number of groups. Consider
information of 200 inter failure times, 40 disjoint subgroups each with a size of 5 are grouped.
The time lapse between every 5th
failure is represented by the sum totals of every subgroup. It’s
also referred to as the 5th
order statistics during a sample observation size of 5.
4. RAYLEIGH MODEL
To estimate the accurate no of defects the Rayleigh model is used throughout the testing. Based
on the stage of the project the testing is done. The following defect prediction is represented as:
𝐦(𝐭) = 𝐚 ⌊𝟏 − 𝐞−(𝐛𝐭)𝟐
⌋ (𝟏)
For rth
order statistics, the mean value function is expressed as
𝐦(𝐭)𝐫
= (𝐚 ⌊𝟏 − 𝐞−(𝐛𝐭)𝟐
⌋)𝟐
(𝟐)
The failure intensity function is given as:
The predictions of ‘a’ and ‘b’, for a sample of n units, the likelihood function must be
obtained first.
The Likelihood function,
Taking the natural logarithm on both sides, The Log Likelihood function is given as:

The parameter ‘a’ is estimated by taking the partial derivative w.r.t ‘a’ and equating to ‘0’,
The parameter ‘b’ is estimated by iterative Newton Raphson Method using
Which is substituted in finding ‘a’, Where g(fc)&g'(&) are expressed as follows.
Taking the Partial derivative w.r.t ‘b’ and equating to ‘0’.

5. APPLYING ANOM ON SELECTED DATASET
The ANOM is the best graphical statistical techniques for comparing group of treatment means.
Statistically significant divergent groups form the comprehensive groups are noticed using
ANOM methodology by contrasting with the mean of each group to overall. Apart from
comparison of group means it can be used to compare rates, proportions and variances. Here
this prime extends Analysis to median, extreme values, mid- range, range of subgroups using

ANOM. In the narration of the section, after observing the tradition of various earlier
researchers in ANOM, the word group and sub group are used in the same sense without any
technical difference between them.
The prime lime lights analysis of the data using sub-grouping of ANOM over the data to be
visualized and is distributed over HLD model. The HLD model is opted and maximum
likelihood estimation mechanism is enforced to gauge the exotic factors of HLD. The data
which is to be analyzed is considered and fitted to the model. The distribution which is
generated from the model is divided into groups of equal size say ‘gs’, this ‘gs’ is optimal if it
ranges between 2 to 10, 2≤gs≤10.
Say n groups of each size ‘gs’ are obtained. The groups are numbered and are treated as
group number. For each group all the different measures are calculated. These measures involve
mean, median, extreme values maximum, minimum, range, midrange for each subgroup.
Data to be analyzed is first applied on the best suited -model which generates a distribution.
The obtained distribution is divided into groups of equal size. These groups are given
numbering and every group is processed to obtain group mean, median, range, mid-range,
extreme values- maximum and minimum.
Each subgroup is processed to obtain subgroup mean by obtaining the mean of all the
distribution data in the subgroup. Each subgroup is sorted and median is obtained and this
process is repeated for every subgroup. For every subgroup extreme values - maximum and
minimum values are noted and thus range (maximum-minimum) of the subgroup is obtained,
using these three maximum, minimum and range, mid-range of each subgroup is obtained.
The mean, median, extreme values – minimum & maximum, range and mid-range for all
subgroups are framed along with their group number.
The framed data measure is each considered separately and sorted. The limits are obtained
by using ANOM measure in other words three sigma standard limits. This gives the limit
probability. The probability is used and applied on sorted framed data measure, to obtain the
lower limit and upper limit lines. The average of the limits is considered as central line.
The unsorted framed data of the measure for which the limits are calculated is taken and a
scatter plot is plotted with the lower, upper and central limit lines. This gives scatter point’s
equal to the number of groups on the plot, each point indicates a subgroup. The subgroups
which are below and above the lower and upper limits respectively are most vulnerable to
failure.
6. DATASET DESCRIPTION
Dataset is synthetic data consisting of software defects with the random generation of data. Two
projects are named project-1, project-2 and project-3. These two datasets contain 4 attributes
and 10k default data and parameters such as defect analysis, Quality, and time taken for
processing of data. All the datasets are belongs to synthetic dataset collected from various
sources.
Table 1 Data belongs to Project-1
Dataset Name Lines of Code
(LOC)
Total no of days
taken for
completion
Total no of
Failures
Project Domain
Project-1 1228 97 76 Account Package
Project-2 1765 110 81 ERP-1
Project-3 2212 132 131 ERP-2

7. EXPERIMENTAL RESULTS
The experimental results are conducted on using JAVA programming language with high
software and hardware requirements. With the stated process we can get appropriate values for
parameters ‘a’ and ‘b’ for an apt seed value. Using these values, we obtain m(t)values and thus
we can also obtain their successive differences. Here three real time datasets (Table-1, Table-2
and Table-3) which contain failure data of three distinct projects are considered and
demonstrated. Table 1, Table 2 and Table 3 shows the time between failures of a software
product. By applying the ANOM with order statistics the following are the results obtained.
Table 2 Successive differences of mean values of Dataset 1
TT(day) m(t) Successive
differences
1-10 4.549 4.341879
11-20 5.676 5.765443
21-30 8.234 2.063118
31-40 11.341 5.898778
41-50 15.234 6.676557
51-60 18.766886 7.342545
61-70 20.8484211 1.89877808
71-75 23.4997063 10.909898
76-80 25.9914322 2.44186473
81-90 28.3184566 1.78896768
91-97 32.47888 2.78687678
Figure 1 Performance of m (t) Successive Differences for Dataset-2

differences
1-10 5.8797 5.89769
11-20 6.34435 5.0987897
21-30 7.24343 3.768676
31-40 9.32235 4.897687
41-50 14.47646 5.7977866
51-60 19.766886 6.342545
61-70 21.8484211 2.6576577
71-80 22.4997063 9.767786
81-90 24.9914322 3.7786877
91-100 26.3184566 2.9767868
101-110 31.78987 3.87687678
differences
1-15 6.877 6.32424
16-30 7.78698 4.8887678
31-40 8.65675 3.877787
41-50 10.97678 5.898778
51-60 14.89878 6.676557
61-70 17.866768 7.342545
71-80 19.8484211 1.89877808
81-90 22.897878 10.909898
91-105 24.8797 2.44186473
106-120 27.879879 1.78896768
121-132 30.979879 2.78687678

8. CONCLUSION
In this paper, the improved software defect prediction models (ISDPM) is developed to predict
the software defects in various software development projects that are present in different
software companies. To show the software quality various methodologies are integrated such
as ANOM and order statistics. The defects are identified within the phase wise. In every phase,
the defects occurred and these are solved in the other phases. The 1st
order and 2nd
order
statistics are applied on all the phases of development. Thus the proposed system shows the
efficient defects prediction.
REFERENCES
[1] Ullah, Najeeb & Morisio, Maurizio & Vetro, Antonio. (2013). A Comparative Analysis of
Software Reliability Growth Models using Defects Data of Closed and Open Source Software.
Proceedings of the 2012 IEEE 35th Software Engineering Workshop, SEW 2012. 187-192.
10.1109/SEW.2012.26.
[2] Kapil Sharma, et al, "Selection of Optimal Software Reliability Growth Models Using a
Distance Based Approach", IEEE Transactions On Reliability, VOL. 59, NO. 2, JUNE 2010.
[3] Syed-Mohamad et al, "Reliability Growth of Open Source Software Using Defect Analysis",
International Conference on Computer Science and Software Engineering, 2008.
[4] Bruno Rossi et al, "Modelling Failures Occurrences of Open Source Software with Reliability
Growth", journal of Open Source Software: New Horizons, page 268-280, 2010.
[5] Cobra Rahmani et al, "A Comparative Analysis of Open Source Software Reliability", Journal
of Software, page 1384-1394, 2010.
[6] K. S. Kumari, B. Amulya and R. S. Prasad, "Comparative study of Pareto Type II with HLD in
assessing the software reliability with order statistics approach using SPC," 2014 International
Conference on Circuits, Power and Computing Technologies [ICCPCT-2014], 2014, pp. 1630-
1636, doi: 10.1109/ICCPCT.2014.7054824.

[7] Y. Wu, Y. Zhang and M. Lu, "Software reliability accelerated testing method based on mixed
testing," 2010 Proceedings - Annual Reliability and Maintainability Symposium (RAMS),
2010, pp. 1-6, doi: 10.1109/RAMS.2010.5448017.
[8] M. Nafreen, M. Luperon, L. Fiondella, V. Nagaraju, Y. Shi and T. Wand ji, "Connecting
Software Reliability Growth Models to Software Defect Tracking," 2020 IEEE 31st
International Symposium on Software Reliability Engineering (ISSRE), 2020, pp. 138-147, doi:
10.1109/ISSRE5003.2020.00022.
[9] K. K. San, H. Washizaki, Y. Fukazawa, K. Honda, M. Taga and A. Matsuzaki, "DC-SRGM:
Deep Cross-Project Software Reliability Growth Model," 2019 IEEE International Symposium
on Software Reliability Engineering Workshops (ISSREW), 2019, pp. 61-66, doi:
10.1109/ISSREW.2019.00044.
[10] C. Guo, S. Zhou, J. Li, F. Chen, D. Li and X. Huang, "A Novel Software Reliability Growth
Model of Safety-critical Software Considering Fault Severity Classification," 2019 4th
International Conference on System Reliability and Safety (ICSRS), 2019, pp. 25-29, doi:
10.1109/ICSRS48664.2019.8987594.
[11] Y Liu, M. Xie, J. Yang and M. Zhao, "A new framework and application of software reliability
estimation based on fault detection and correction processes", Proc. IEEE International
Conference on Software Quality Reliability and Security, pp. 65-74, 2015.
[12] J. Yang, Y Liu, M. Xie and M. Zhao, "Modeling and analysis of reliability of multi-release open
source software incorporating both fault detection and correction processes", Journal of Systems
and Software, vol. 115, pp. 102-110, 2016.
[13] M. Cinque, D. Cotroneo, A. Pecchia, R. Pietrantuono and S. Russo, "Debugging-workflow-
aware software reliability growth analysis", Software Testing Verification and Reliability, vol.
27, no. 7, pp. e1638, 2017.
[14] H. Okamura and T. Dohi, "A generalized bivariate modeling framework of fault detection and
correction processes", Proc. IEEE International Symposium on Software Reliability
Engineering, pp. 35-45, 2017.
[15] H. Sukhwani, J. Alonso, K. Trivedi and I. Mcginnis, "Software reliability analysis of NASA
space flight software: A practical experience", Proc. IEEE International Conference on Software
Quality Reliability and Security, pp. 386-397, 2016.

Defect Prediction Model Using Order Statistics

Recomendados

Recomendados

Más contenido relacionado

Similar a Defect Prediction Model Using Order Statistics

Similar a Defect Prediction Model Using Order Statistics (20)

Más de IAEME Publication

Más de IAEME Publication (20)

Último

Último (20)

Defect Prediction Model Using Order Statistics