3. Review
systematic
May, 2007
316 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 33, NO. 5, MAY 2007
Cross versus Within-Company Cost
Estimation Studies: A Systematic Review
Barbara A. Kitchenham, Member, IEEE Computer Society, Emilia Mendes, and
Guilherme H. Travassos
Abstract—The objective of this paper is to determine under what circumstances individual organizations would be able to rely on
cross-company-based estimation models. We performed a systematic review of studies that compared predictions from cross-
company models with predictions from within-company models based on analysis of project data. Ten papers compared cross-
company and within-company estimation models; however, only seven presented independent results. Of those seven, three found
that cross-company models were not significantly different from within-company models, and four found that cross-company models
were significantly worse than within-company models. Experimental procedures used by the studies differed making it impossible to
undertake formal meta-analysis of the results. The main trend distinguishing study results was that studies with small within-company
data sets (i.e., < 20 projects) that used leave-one-out cross validation all found that the within-company model was significantly
different (better) from the cross-company model. The results of this review are inconclusive. It is clear that some organizations would
be ill-served by cross-company models whereas others would benefit. Further studies are needed, but they must be independent (i.e.,
based on different data bases or at least different single company data sets) and should address specific hypotheses concerning the
conditions that would favor cross-company or within-company models. In addition, experimenters need to standardize their
experimental procedures to enable formal meta-analysis, and recommendations are made in Section 3.
Index Terms—Cost estimation, management, systematic review, software engineering.
Ç
1 INTRODUCTION
E studies of cost estimation models (e.g., [12], These problems motivated the use of cross-company
ARLY
[8]) suggested that general-purpose models such as models (models built using cross-company data sets, which
COCOMO [1] and SLIM [24] needed to be calibrated to are data sets containing data from several companies) for
specific companies before they could be used effectively. effort estimation and productivity benchmarking, and,
Taking this result further and following the proposals made subsequently, several studies compared the prediction
by DeMarco [4], Kok et al. [14] suggested that cost accuracy between cross-company and within-company
estimation models should be developed only from single- models. In 1999, Maxwell et al. [18] analyzed a cross-
company data. However, three main problems can occur company benchmarking database by comparing the accu-
when relying on within-company data sets [3], [2]: racy of a within-company cost model with the accuracy of a
cross-company cost model. They claimed that the within-
1. The time required to accumulate enough data on
company model was more accurate than the cross-company
past projects from a single company may be
model, based on the same holdout sample. In the same year,
prohibitive.
Briand et al. [2] found that cross-company models could be
2. By the time the data set is large enough to be of use,
as accurate as within-company models. The following year,
technologies used by the company may have
Briand et al. [3] reanalyzed the data set employed by
changed, and older projects may no longer be
Maxwell et al. [18] and concluded that cross-company
representative of current practices.
models were as good as within-company models. Two
3. Care is necessary as data needs to be collected in a
years later, Wieczorek and Ruhe [26] confirmed this same
consistent manner.
trend using the same data set employed by [2]. Three years
later, Mendes et al. [20] also confirmed the same trend using
. B.A. Kitchenham is with the School of Computing and Mathematics, yet another data set.
University of Keele, Keele Village, Staffordshire, ST5 5BG, UK. These results seemed to contradict the results of the
E-mail: b.a.kitchenham@cs.keele.ac.uk.
earlier studies and pave the way for improved estimation
. E. Mendes is with the Computer Science Department, University of
methods for companies that did not have their own project
Auckland, Private Bag 92019, Auckland, New Zealand.
E-mail: emilia@cs.auckland.ac.nz. data. However, other researchers found less encouraging
. G.H. Travassos is with UFRJ/COPPE, Systems Engineering and
results. Jeffery et al. undertook two studies, both of which
Computer Science Program, PO Box 68511, 21941-972 Rio de Janeiro—
suggested that within-company models were superior to
RJ, Brazil. E-mail: ght@cos.ufrj.br.
cross-company models [6], [7]. Two years later, Lefley and
Manuscript received 6 June 2006; revised 27 Nov. 2006; accepted 2 Jan. 2007;
Shepperd claimed that the within-company model was
published online 20 Feb. 2007.
Recommended for acceptance by A. Mockus. more accurate than the cross-company model, using the
For information on obtaining reprints of this article, please send e-mail to:
same data set employed by Wieczorek and Ruhe [26] and
tse@computer.org, and reference IEEECS Log Number TSE-0129-0606.
Briand et al. [2]. Finally, a year later Kitchenham and
Digital Object Identifier no. 10.1109/TSE.2007.1001.
0098-5589/07/$25.00 ß 2007 IEEE Published by the IEEE Computer Society
4. Review
systematic
May, 2007
Barbara Kitchenham
316 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 33, NO. 5, MAY 2007
Cross versus Within-Company Cost
Estimation Studies: A Systematic Review
Barbara A. Kitchenham, Member, IEEE Computer Society, Emilia Mendes, and
Guilherme H. Travassos
Abstract—The objective of this paper is to determine under what circumstances individual organizations would be able to rely on
cross-company-based estimation models. We performed a systematic review of studies that compared predictions from cross-
company models with predictions from within-company models based on analysis of project data. Ten papers compared cross-
company and within-company estimation models; however, only seven presented independent results. Of those seven, three found
that cross-company models were not significantly different from within-company models, and four found that cross-company models
were significantly worse than within-company models. Experimental procedures used by the studies differed making it impossible to
undertake formal meta-analysis of the results. The main trend distinguishing study results was that studies with small within-company
data sets (i.e., < 20 projects) that used leave-one-out cross validation all found that the within-company model was significantly
different (better) from the cross-company model. The results of this review are inconclusive. It is clear that some organizations would
be ill-served by cross-company models whereas others would benefit. Further studies are needed, but they must be independent (i.e.,
based on different data bases or at least different single company data sets) and should address specific hypotheses concerning the
Emilia Mendes
conditions that would favor cross-company or within-company models. In addition, experimenters need to standardize their
experimental procedures to enable formal meta-analysis, and recommendations are made in Section 3.
Index Terms—Cost estimation, management, systematic review, software engineering.
Ç
1 INTRODUCTION
E studies of cost estimation models (e.g., [12], These problems motivated the use of cross-company
ARLY
[8]) suggested that general-purpose models such as models (models built using cross-company data sets, which
COCOMO [1] and SLIM [24] needed to be calibrated to are data sets containing data from several companies) for
specific companies before they could be used effectively. effort estimation and productivity benchmarking, and,
Taking this result further and following the proposals made subsequently, several studies compared the prediction
by DeMarco [4], Kok et al. [14] suggested that cost accuracy between cross-company and within-company
estimation models should be developed only from single- models. In 1999, Maxwell et al. [18] analyzed a cross-
company data. However, three main problems can occur company benchmarking database by comparing the accu-
when relying on within-company data sets [3], [2]: racy of a within-company cost model with the accuracy of a
cross-company cost model. They claimed that the within-
1. The time required to accumulate enough data on
company model was more accurate than the cross-company
past projects from a single company may be
model, based on the same holdout sample. In the same year,
prohibitive.
Briand et al. [2] found that cross-company models could be
2. By the time the data set is large enough to be of use,
as accurate as within-company models. The following year,
technologies used by the company may have
Guilherme Travassos
Briand et al. [3] reanalyzed the data set employed by
changed, and older projects may no longer be
Maxwell et al. [18] and concluded that cross-company
representative of current practices.
models were as good as within-company models. Two
3. Care is necessary as data needs to be collected in a
years later, Wieczorek and Ruhe [26] confirmed this same
consistent manner.
trend using the same data set employed by [2]. Three years
later, Mendes et al. [20] also confirmed the same trend using
. B.A. Kitchenham is with the School of Computing and Mathematics, yet another data set.
University of Keele, Keele Village, Staffordshire, ST5 5BG, UK. These results seemed to contradict the results of the
E-mail: b.a.kitchenham@cs.keele.ac.uk.
earlier studies and pave the way for improved estimation
. E. Mendes is with the Computer Science Department, University of
methods for companies that did not have their own project
Auckland, Private Bag 92019, Auckland, New Zealand.
E-mail: emilia@cs.auckland.ac.nz. data. However, other researchers found less encouraging
. G.H. Travassos is with UFRJ/COPPE, Systems Engineering and
results. Jeffery et al. undertook two studies, both of which
Computer Science Program, PO Box 68511, 21941-972 Rio de Janeiro—
suggested that within-company models were superior to
RJ, Brazil. E-mail: ght@cos.ufrj.br.
cross-company models [6], [7]. Two years later, Lefley and
Manuscript received 6 June 2006; revised 27 Nov. 2006; accepted 2 Jan. 2007;
Shepperd claimed that the within-company model was
published online 20 Feb. 2007.
Recommended for acceptance by A. Mockus. more accurate than the cross-company model, using the
For information on obtaining reprints of this article, please send e-mail to:
same data set employed by Wieczorek and Ruhe [26] and
tse@computer.org, and reference IEEECS Log Number TSE-0129-0606.
Briand et al. [2]. Finally, a year later Kitchenham and
Digital Object Identifier no. 10.1109/TSE.2007.1001.
0098-5589/07/$25.00 ß 2007 IEEE Published by the IEEE Computer Society
5. Company Cross No Trend
Specific Models Company Models
6. Company Cross No Trend
Specific Models Company Models
(four studies) (four studies) (two studies)
7. Company Cross No Trend
Specific Models Company Models
(four studies) (four studies) (two studies)
Barbara Kitchenham Emilia Mendes Katrina Maxwell
Martin Shepperd
Isabella Wieczorek Lionel Briand
8. Company Cross No Trend
Specific Models Company Models
(four studies) (four studies) (two studies)
Barbara Kitchenham Emilia Mendes Katrina Maxwell
2 2
Martin Shepperd
Isabella Wieczorek Lionel Briand
2
9. Company Cross No Trend
Specific Models Company Models
(four studies) (four studies) (two studies)
Barbara Kitchenham Emilia Mendes Katrina Maxwell
1
2 2 1
Martin Shepperd
Isabella Wieczorek Lionel Briand
3
2
2
10. Company Cross No Trend
Specific Models Company Models
(four studies) (four studies) (two studies)
Barbara Kitchenham Emilia Mendes Katrina Maxwell
1 1
2 2 1
Martin Shepperd
Isabella Wieczorek Lionel Briand
3
2
2 1
21. METRICS, 2005
An Empirical Analysis of Software Productivity Over Time
Rahul Premraj Martin Shepperd
Bournemouth University, UK Brunel University, UK
rpremraj@bmth.ac.uk martin.shepperd@brunel.ac.uk
Barbara Kitchenham∗ Pekka Forselius
National ICT, Australia STTF Oy, Finland
Barbara.Kitchenham@nicta.com.au pekka.forselius@kolumbus.fi
Abstract commodities. Unfortunately, for software the notion of out-
put is not straightforward. Lines of code are problematic
due to issues of layout, differing language and the fact that
OBJECTIVE - the aim is to investigate how software
most software engineering activity does not directly involve
project productivity has changed over time. Within this
code. An alternative is Function Points (FPs), in its various
overall goal we also compare productivity between differ-
flavours, which although subject to some criticism [?] are
ent business sectors and seek to identify major drivers.
in quite widespread use and so in a sense represent the least
METHOD - we analysed a data set of more than 600
bad alternative. In our analysis the output (or size) measure
projects that have been collected from a number of Finnish
collected is Experience Points 2.0 [?], a variant of FPs.
companies since 1978.
RESULTS - overall, we observed a quite pronounced im-
Second, productivity is impacted by a very large num-
provement in productivity over the entire time period,
ber of factors, many of which are inherently difficult to as-
though, this improvement is less marked since the 1990s.
sess, e.g. task difficulty, skill of the project team, ease of
However, the trend is not smooth. We also observed pro-
interaction with the customer/client and the level of non-
ductivity variability between company and business sec-
functional requirements imposed such as dependability and
tor.
performance.
CONCLUSIONS - whilst this data set is not a ran-
dom sample so generalisation is somewhat problematic,
Third, there are clear interactions between many of these
we hope that it contributes to an overall body of knowl-
factors so for instance, it is easier to be productive if quality
edge about software productivity and thereby facilitates the
can be disregarded.
construction of a bigger picture.
Keywords: project management, projects, software produc-
Despite these caveats, this paper seeks to analyse soft-
tivity, trend analysis, empirical analysis.
ware project productivity trends from 1978-2003 from
a data set of more than 600 projects from Finland. The
projects are varied in size (6 - 5000+ FPs), business sec-
1. Introduction tor (e.g. Retail) and type (New Development or Mainte-
nance). However, we believe there are sufficient data to
Given the importance and size of the software industry it is draw some preliminary conclusions.
no surprise that there is a great deal of interest in productiv-
ity trends and in particular whether the industry, as a whole, The remainder of the paper is organised as follows. The
is improving over time. Obviously this is a complex ques- next section very briefly reviews some related work includ-
tion for at least three reasons. ing a similar, earlier study by Maxwell and Forselius [?].
First, productivity is difficult to measure because the tra- Next we describe the data set used for our analysis. We then
ditional definition, i.e. the ratio of outputs to inputs re- give the results of our analysis, first overall and then af-
quires that we have objective methods of measuring both ter splitting the data set into groups of more closely related
projects. We conclude with a discussion of the significance
of the results and some comments on the actual process of
Barbara Kitchenham is also with Keele University, UK Bar-
∗
bara@cs.keele.ac.uk analysing the data.
29. Research Objectives
1. To develop company-specific cost models for
comparisons against other models.
Training Data Testing Data
30. Research Objectives
I1. To develop cross-company cost models to
compare against company-specific cost models.
Training Data Testing Data
31. Research Objectives
I1I. To develop business-specific models to compare
their accuracy against company-specific and
cross-company cost models.
Training Data Testing Data
32. Research Objectives
IV. To develop business-specific cost models to
determine if they can be used by companies
from other business sectors.
Training Data Testing Data
33.
34. Pred (50) Pred (25)
Pred (50)
Pred (25) better
1.00 25.75 50.50 75.25 100.00
35. Pred (50) Pred (25)
Pred (50)
Pred (25) better
1.00 25.75 50.50 75.25 100.00
MdMRE MMRE
for comparability
MdMRE
MMRE better
1.00 25.75 50.50 75.25 100.00
36. Training Testing
Company-Specific
Cost Models
better better
Pred50 Pred25 MdMRE MMRE
A A
B B
C C
D D
E E
0 25 50 75 100 0 25 50 75 100
37. Training Testing
Cross-Company
Cost Models
better better
Pred50 Pred25 MdMRE MMRE
A A
B B
C C
D D
E E
0 25 50 75 100 0 25 50 75 100
38. Training Testing
Business-Specific
Cost Models
better better
Pred50 Pred25 MdMRE MMRE
A A
B B
C C
D D
E E
0 25 50 75 100 0 25 50 75 100
39. Training Testing
Cross-Business
Cost Models
• Projects from some sectors could be used
to predict for projects from other sectors.
• For example, Retail sector projects could
predict with high accuracy (Pred50 > 50%).
• But projects from sectors are best used to
predict for themselves.
43. Threats to Validity
• Projects originated from Finland only.
external
• Data cleaning removed nearly half the
internal
projects.
• Only used Size as independent variable.
45. Conclusions
Company Cross No Trend
Specific Models Company Models
(four studies) (four studies) (two studies)
Barbara Kitchenham Emilia Mendes Katrina Maxwell
1 1
2 2 1
Martin Shepperd
Isabella Wieczorek Lionel Briand
3
2
2 1
46. Conclusions
Company Cross No Trend
what are my
Specific Models Company Models
(four studies) (four studies) (two studies)
options?
Barbara Kitchenham Emilia Mendes Katrina Maxwell
1 1
2 2 1
Martin Shepperd
Isabella Wieczorek Lionel Briand
3
2
2 1
47. Conclusions
Company Cross No Trend
what are my
Specific Models Company Models
(four studies) (four studies) (two studies)
options?
Barbara Kitchenham Emilia Mendes Katrina Maxwell
1 1
2 2 1
Martin Shepperd
Isabella Wieczorek Lionel Briand
3
2
2 1
Company Cross
Business
Specific Models Company Models
Specific Models
50. Conclusions
• No model performed consistently well
across all experiments.
• Business-specific models performed
comparably to company-specific models.
51. Conclusions
• No model performed consistently well
across all experiments.
• Business-specific models performed
comparably to company-specific models.
• Business-specific models performed better
than cross-company models.
52. Conclusions
• No model performed consistently well
across all experiments.
• Business-specific models performed
comparably to company-specific models.
• Business-specific models performed better
than cross-company models.
• Reducing heterogeneity in data may
increase their applicability to problems.
53. Conclusions
• No model performed consistently well
across all experiments.
• Business-specific models performed
comparably to company-specific models.
• Business-specific models performed better
than cross-company models.
• Reducing heterogeneity in data may
increase their applicability to problems.
• ... and lead to better prediction models.
55. Open Questions
• Can we use other algorithms such as
decision trees and statistical clustering?
56. Open Questions
• Can we use other algorithms such as
decision trees and statistical clustering?
• What are the commonalities amongst
projects?
57. Open Questions
• Can we use other algorithms such as
decision trees and statistical clustering?
• What are the commonalities amongst
projects?
• Does heterogeneity in data sets impact
other software engineering areas?
58. Open Questions
• Can we use other algorithms such as
decision trees and statistical clustering?
Thank you!
• What are the commonalities amongst
projects?
• Does heterogeneity in data sets impact
other software engineering areas?