1. Cost Estimation of Ontologies Using ONTOCOM
Elena Simperl, Tobias Bürger, Igor Popov, UIBK
2. Motivation: A typical business scenario
How do I
identify How much
Ontologies
relevant does it
?
expeditures? cost?
What do I gain
from the
introduction of
What do we the system ?
need to
build them?
How do the
gains
materialize ?
Project ACTIVE
Date: 18.06.2008,
Dubrovnik
3. Methods and approaches to cost estimation
Bottom-up estimation Top-down estimation
Experts estimate the costs of Experts estimate the total costs
Expert Judgment low-level components or of a product or a project
activities
Costs are calculated using Cost are estimated using a
Analogy Method analogies between low-level or global similarity function for
activities products or projects
Costs are calculated as an
Decomposition average sum of the costs of
lower-level units, whose
Method development are known in
advance
Costs are calculated using a Costs are calculated using a
statistic model which predicts statistic model which is
the costs of lower-level units on calibrated using historical data
Parametric Method the basis of historical data about and predicts the current value
the costs of developing such of the total development costs
units
Project ACTIVE
Date: 18.06.2008,
Dubrovnik
4. ONTOCOM- Overview
ONTOCOM – A cost estimation model for building ontologies
ONTOCOM uses top-down, parametric and expert-based methods to form
its basis for cost estimation of ontology building
ONTOCOM is realized using a combination of methods:
- Top-down breakdown of ontology engineering processes to reduce complexity
(Decomposition method)
- Parametric method to create a-priori statistical prediction model
- Validation and calibration of model according to existing project data and
experts estimations lead to a-posteriori model (Expert judgment
Project ACTIVE
Date: 18.06.2008,
Dubrovnik
5. ONTOCOM
How ONTOCOM works:
Define lifecycle phases
•Ontology building
•Ontology reuse
•Ontology maintenance
Specify cost drivers
•Ontology building
•Ontology reuse
•Ontology maintenance Refine the model
•Evaluate cost drivers
Top-down methodology •Specify start values
•Calibrate the model
Parametric methodology
Parametric methodology
Expert-based methodology
Project ACTIVE
Date: 18.06.2008,
Dubrovnik
7. The parametric equation
PM: effort in person months
A : baseline multiplicative constant (in person months)
Size : expected size of ontology (distinction between different entitiy types
e.g. classes, properties, axioms´and size of ontology
building/reuse/maintenance)
α : acknowledges non-linear behavior wrt. to size
EM : effort multiplier (correspond to cost drivers)
Project ACTIVE
Date: 18.06.2008,
Dubrovnik
8. Effort multipliers
Each process stage is characterized by a specific set of cost drivers
The cost drivers are associated to rating levels
The rating level (from very low to very high) expresses the impact of each cost driver
on the development effort
Each rating level of each cost driver is associated to a weight (quantitative analysis) -
effort multiplier (EM)
The values of effort multiplier are subject of further calibration on the basis of
the statistical analysis of real-world project data.
Project ACTIVE
Date: 18.06.2008,
Dubrovnik
9. Cost drivers
Product drivers account for the influence ontology characteristics have on
costs
- e.g. Complexity of the Domain Analysis, Required Reusability, Documentation
Needs
Project drivers account for the influence of project setting characteristics
on the overall development
- E.g. Support Tools, multi-site development
Personnel drivers emphasize the role of team experience, ability and
continuity w.r.t. the effort invested in the process
- E.g. Ontologist/Domain Expert Experience, Language/Tool Experience
Total amount of cost drivers: 20
Identification of cost drivers through literature survey, analysis of empiricial
data and expert interviews
Overview of the cost drivers: http://ontocom.sti-innsbruck.at/ontocom.htm
Project ACTIVE
Date: 18.06.2008,
Dubrovnik
10. ONTOCOM
ONTOCOM Model Calibration
Input from experts
Calibration
Linear Regression
a-priori method Correlation Analysis a-posteriori method
Bayesian Analysis
Input from gathered data
Project ACTIVE
Date: 18.06.2008,
Dubrovnik
11. Using ONTOCOM: An example
Exemplary ontology with 600
concepts, 100 relations and 50
axioms.
Cost drivers:
- domain analysis complexity (DCPLX):
high
- Evaluation of the results (OE) has a
high influence on the effort
- Instantiation complexity (ICPLX) has a
low impact on the effort
- Remaining cost drivers: nominal effort
Constant A and α: values 2.58 and
0.15 as resulting from the calibration
Project ACTIVE
Date: 18.06.2008,
Dubrovnik
12. Data collection using an online survey
We need your data – please visit the survey here:
http://survey.sti2.at/public/survey.php?name=OntocomSurveyJune13
Project ACTIVE
Date: 18.06.2008,
Dubrovnik
13. Data collection and model calibration in SALERO
55 identified multimedia ontologies, 15
replies (30 %)
Survey results
- Main application of multimedia
ontologies: Annotation (47%)
- Total size between 35-10000
- Development effort between 0.5 and
130 PM
- Many ontologies were built from
scratch (45%)
- Most ontologies in OWL-DL (53%)
Calibration using linear regression and
Bayesian analysis resulted in new
effort multipliers
Prediction quality improved!
Project ACTIVE
Date: 18.06.2008,
Dubrovnik
14. New web site
http://ontocom.sti-innsbruck.at
Project ACTIVE
Date: 18.06.2008,
Dubrovnik
15. Outlook and future plans
Development of a family of ONTOCOM models
- ONTOCOM-Ultra Lite for the estimiation of folksonomies
- ONTOCOM-Lite for the estimation of lightweight ontologies
- ONTOCOM (Standard) for the estimation of heavyweight ontologies
Tool support for ONTOCOM
- Automatic calibration and addition / removal of data points
- Form based use of ONTOCOM for cost prediction
Benefit estimation of ontologies
Project ACTIVE
Date: 18.06.2008,
Dubrovnik
16. Goal: Web 2.0 and semantic technologies’ economic
measurements – cost estimation
Produce methods to assess costs of core Web2.0 and semantic technological
solutions
Demonstrate their tangible and measurable benefits within an enterprise for their
adoption
Cost prediction for development, maintenance and usage of Web2.0 and semantic
technological components
How to reach this goal:
- Develop a general model of Semantic Web based applications
- Develop a catalogue of cost drivers for distributed, collaborative applications based on
Web2.0 and semantic technologies
- Using literature analysis, expert interviews and knowledge elicitation (use case
partners)
- Collect cost-benefit related data to calibrate the model & improve prediction quality
Expected outcome:
- Tool suite for effort estimation, planning and controlling
- Prototypical methods to integrate cost/benefit rationals into collaborative knowledge creation
/ elicitation tasks
Project ACTIVE
Date: 18.06.2008,
Dubrovnik
17. Subgoal: Benefit estimation methods for ontologies
Central question: What are the benefits gained from the introduction of an
ontology based application?
Typical distinction: tangible / intangible benefits
Different methods have a quantitative, qualitative or financial output
Requirements – the nature of benefits of ontologies
1. Most expected benefits from typical uses are intangible
- For Communication: to ensure interoperability, for disambiguation (unique
identification), or for knowledge transfer (by excluding unwanted interpretations
through informal semantics).
- For Computational Inference: for browsing / searching (automatic inferring of implicit
facts), for automation / code generation or to spot logical inconsistencies.
- For Reuse and organisation of knowledge: for knowledge reuse or for structuring of
information and knowledge.
2. As the main impact of the use of ontologies is to improve information communication, the
method should not have a financial output
3. Ontologies and applications using them should be assess simultaneously as an ontology
typically only acquires value when used in combination with an application (analogously to
information systems)
18. First proposal: A multiple gap model for user information
satisfaction analysis
User Information Satisfaction (UIS) is a method to measure intangible benefits
UIS can be measured through a comparison of user expectations with perceived
performance on a number of different facets
Multiple gap models are useful for assessing how systems are viewed at various
stages of their design, implementation, and use
UIS = f(gap1,…Gapn, Influencing-factors)
19. Sources
Elena Paslaru Bontas Simperl, Christoph Tempich, Malgorzata Mochol
"Cost estimation for ontology development: applying the ONTOCOM model"
In W. Abramowicz and H.C. Mayr, Technologies for Business Information
Systems. Springer-Verlag Berlin Heidelberg , 2006.
Elena Paslaru Bontas Simperl, Christoph Tempich, York Sure "ONTOCOM:
A Cost Estimation Model for Ontology Engineering" In: Proceedings of the
International Semantic Web Conference ISWC 2006
Tobias Bürger "A Benefit Estimation Model for Ontologies" In: Poster
Proceedings of the 5th European Semantic Web Conference (ESWC),
2008.
Further information: see http://ontocom.sti-innsbruck.at/info.htm
Project ACTIVE
Date: 18.06.2008,
Dubrovnik
20. Thank you for your attention
Project ACTIVE
Date: 18.06.2008,
Dubrovnik