SlideShare una empresa de Scribd logo
1 de 57
Towards Understanding
the Replication of
SE Experiments
Natalia Juristo
Universidad Politecnica de Madrid (Spain)
&
University of Oulu (Finland)
ESEM Conference Baltimore (USA) October 11th, 2013
Scope & Terminology
 This talk focuses on the replication of
experiments
 I will refer to the study whose results we
want to check as the baseline experiment
Replication Intuitive Definition
 Deliberate repetition of research
procedures in a second investigation for
the purpose of determining if earlier
results can be reproduced
Content
 Replication & the experimental paradigm
 State of replication in ESE: practice & theory
 Shedding a bit of light
 Purposes for replicating
 Replication functions
 Some answers
 Replication limits
 Baseline and replication minimum degree of similarity
 Admissible changes
 Reproduction of results
 Threats to reuse materials
 Summary
Role of Replication in
Experimental Paradigm
Searching for Regularities
 Science does not settle for anecdotes. A
scientific law or theory describes a regular
occurrence in the world
 Regularities existing in reality are identified by
reproducing the same event in different
replications
 The result of one experiment is an isolated
event
One Result, Three Meanings
 Without reproduction of results it is impossible
to distinguish whether they
 occurred by chance
 are artifactual
 the event occurs only in the experiment, not in reality
 really correspond to a regularity
State of Replication in ESE
Practice
Not Enough Replications
 Most SE experiments have not yet been
replicated
 Two reviews provide empirical data to
support this point
 Let us look at their results
Experiments in Leading Journals
& Conferences (1993-2002)
 5,453 articles published from 1993 to 2002
in major SE journals and conference
proceedings
 113 experiments
 20 (17.7%) described as replications
Sjøberg et al. “A survey of
controlled experiments in SE” TSE, 2005
All Publications
 96 papers reporting replications
 133 replications of 72 baseline studies
 Any type of empirical study
 Quasi-Experiments 35 49%
 Controlled Experiments 21 29%
 Case Study 15 21%
 Survey 1 1%
da Silva et al. “Replication of Empirical Studies in
Software Engineering Research: A Systematic
Mapping Study” EMSE 2013
Mostly Internal & Not Stand-Alone
Publications [Da Silva et al. 2013]
Internal – Original-Included reports
Internal – Replication-only reports
1994 1995 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
0
0 0 2 1 1 1 3 1 3 6 1 3 5 2 4 1
3 0 0 3 1 2 2 1 3 11 7 5 2 6 14
1996
0
0
NumberofReplications
2
4
6
8
10
12
14
16
Total 1 5 6 4 5 6 6 4 4 9 15 11 13 9 13 220
18
20
22
External 1 2 4 3 1 4 1 1 0 0 3 1 3 5 3 70
First Paper Published on a
Replication [Da Silva et al. 2012]
 In SE, the first article that explicitly reported
a replication of an empirical study was
published in 1994
 Daly, Brooks, Miller, Roper & Wood
Verification of Results in Sw Maintenance Through
External Replication
Intl Conf on Software Maintenance
EMSE Special Issue on
Replication
 The large number of submissions was
admittedly more than we expected
 We received a total of 16 submissions
 Encouraging the publication of replications
will foster researchers to replicate more
studies
State of Replication in ESE
Theory
First Theoretical Publication on
Replication
 In 1999, a paper discussed a framework to
organize sets of related experiments
(families) and the generation of knowledge
from such sets
 Basili, Shull & Lanubile. Building knowledge through
families of experiments. TSE
 Is a family exactly a set of replications?
 …experiments can be viewed as part of common
families of studies, rather than being isolated
events…
More Activity in the Last 10 Years
 Shull, Basili, Carver, Maldonado, Travassos, Mendonça & Fabbri Replicating
software engineering experiments: Addressing the tacit knowledge problem.
ISESE 2002
 Vegas, Juristo, Moreno, Solari & Letelier Analysis of the influence of
communication between researchers on experiment replication. ISESE 2006
 Brooks, Roper, Wood, Daly & Miller Replication’s role in software engineering.
Guide to Advanced Empirical SE. Springer 2008
 Juristo & Vegas Using differences among replications of software engineering
experiments to gain knowledge. ESEM 2009 [Juristo & Vegas The Role of Non-
Exact Replications in SE EMSE Journal 2011]
 Krein & Knutson A Case for replication: Synthesizing research methodologies in
SE. RESER 2010
 Gómez, Juristo & Vegas Replications types in experimental disciplines. ESEM
2010
 Juristo, Vegas, Solari, Abrahao & Ramos A Process for Managing Interaction
between Experimenters to Get Useful Similar Replications IST 2013
State of the Theory
 There is no agreement yet on terminology,
typology, purposes, operation and other
replication issues
 There is not even agreement on what a
replication is!!
 Different authors consider different types of
changes to the baseline experiment as
admissible
Example of Divergent Views
 Some researchers advise the use of different protocols
and materials to preserve independence and prevent
error propagation in replications by using the same
configuration
 Kitchenham
The role of replications in ESE - a word of warning EMSE 2008
 Other researchers recommend the reuse of materials to
assure that replications are similar enough for results to
be comparable
 Shull, Carver, Vegas & Juristo
The role of replications in ESE EMSE 2008
Shedding some Light on
Replication
Role of Replication
The Two Roles of Replication
 Validation
 Learning
Learning Relevant Conditions
 As more replications of Thompson and
McConnell’s baseline experiment were run
 different conditions influencing the results of this
experiment were identified
 After several hundred experiments had been run
 experimenters managed to identify around 70
conditions influencing the behavior of this type of
invertebrate
Which are the Important
Variables?
 “…In fact, the principle of Transversely
Excited Atmospheric (TEA) lasers,
scientists did not know that the inductance
of the top was important”
A physicist quoted in
Changing Order: Replication and Induction in Scientific Practice
Harry Collins 1992
First Learn, Then Validate
 “In the early stages, failure to get the expected
results is not falsification but a step in the
discovery of some interfering factor.
For immature experimental knowledge, the first
step is … to find out which experimental
conditions should be controlled”
Validity and the Research Process
Brinberg and McGrath 1985
SE Problems with Identical
Replications
 SE has tried to repeat experiments identically,
but no exact replications have yet been
achieved
 The complexity of the software development
setting prevents the many experimental
conditions from being reproduced identically
 Yet this is a regular rather than an exceptional
situation
In the Beginning Most is Unkown
 “Most aspects are unknown when we start
to study a phenomenon experimentally.
Even the tiniest change in a replication
can lead to inexplicable differences in the
results”
Validity and the Research Process
Brinberg and McGrath 1985
Start with Similar Replications
 “The less that is known about an area the more
power a very similar experiment has ... This is
because, in the absence of a well worked out set
of crucial variables, any change in the
experiment configuration, however trivial in
appearance, may well entail invisible but
significant changes in conditions”
Changing Order:
Replication and Induction in Scientific Practice
Harry Collins 1992
Learning & Validation Process
1. Start with identical replications
 At the beginning of experimental research, equality, even if
targeted, will not happen
 There will be either invisible but significant changes in conditions or
induced changes due to context adaptation or both
 Failure to get the expected results should not be construed as
falsification, but as a step towards the discovery of some new
factor
2. Later on, both knowledge discovery and testing
can be more systematic
 Changes in the configuration will be made purposely to learn
more variables and rule out artifactual results
Learning is Even More Important
 Replication is needed not merely to
validate one’s findings, but more
importantly, to establish the increasing
range of radically different conditions
under which the findings hold, and the
predictable exceptions
The design of replicated studies. American Statistician
Lindsay and Ehrenberg 1993
Shedding some Light on
Replication
Functions of Replication
Reminder of Experimental Setting
 Operationalization
 Treatments
 Response variable
 Protocol
 Experimental design
 Experimental objects
 Guides
 Measuring instruments
 Data analysis techniques
 Population
 Objects
 Subjects
Verification Functions
 Control experimental errors
 Control protocol independence
 Understand operationalization limits
 Understand population limits
Control Experimental Errors
 Verify that the results of the baseline experiment
are not a chance product of an error
 All elements of the experiment must resemble
the baseline experiment as closely as possible
 Collateral benefit
 Provide an understanding of the natural (random)
variation of the observed results
 critical for being able to decide whether or not results hold in
dissimilar replications
Control Protocol Independence
 Verify that the results of the baseline experiment
are not artifactual
 An artifactual result is due to the experimental
configuration and cannot be guaranteed to exist in
reality
 The experimental protocol needs to be changed
for this purpose
 If an experiment is replicated several times using the
same materials, the observed results may occur due
to the materials
 The same applies for all protocol elements
Understanding Operationalization
Limits
 Learn how sensitive results are to different
operationalizations
 Treatment operationalizations
 treatment application procedures, treatment
instructions, resources, treatment
transmission …
 Effect operationalizations
 Metrics, measurement procedures …
Understand Population Limits
 Learn the extent to which results hold for
other subject types or other types of
experimental objects
 Learn to which specific population the
experimental sample belongs and what
the characteristics of such population are
Changes & Replication Functions
Experimental
Configuration
Control
Experimental
Error
Control Protocol
Independence
Understand
Operationalization
Limits
Understand
Population Limits
Operationalization = = ≠ =
Population = = = ≠
Protocol = ≠ = =
Function of Replication
LEGEND: = the element is equal to, or as similar as possible to, the baseline experiment
≠ the element varies with respect to the baseline experiment
Some Answers
Question 1
What exactly is a replication
study?
Establishing Limits
Amount of Changes
Run
Same data
different
models or
statistical
methods
Identical
same site &
researchers
+
Protocol
changes
Operationa-
lization
changes
Population
changes
Just the
hypothesis
kept
RE-ANALYSIS REPETITION REPLICATION REPRODUCTION
Not Run
Question 2
What level of similarity should an
experiment have to be
considered a replication rather
than a new experiment?
Unchanged elements of a
replication
 A replication must share the hypothesis
with the baseline experiment
 Same response variable
 although not same metric
 Same treatments
 although not same operationalization
 at least two treatments in common
Partials Replications
Exp. A (Baseline OR Replication)
Exp. B (Replication OR Baseline)
Exp. D
Replication C
RV3
RV2
RV1
T5
T4
T3
T2
T6
T1
Question 3
What changes are acceptable?
Levels of Verification
 Similarity between the baseline
experiment and a replication serve
different verification purposes depending
on the changes made
 Replicating an experiment as closely as possible
 Verifies results are not accidental
 Varying the experimental protocol
 Verifies results are not artifactual
 Varying the population properties
 Verifies types of populations for which the results hold
 Varying the operationalization
 Verifies range of operationalizations for which the results hold
Can I Change Everything?
 It is better not to change everything at the
same time
 We can understand the source of differences
in results better if only one change is made at
a time
 But a replication with a lot of changes is
not rendered useless or doomed to failure
 We will just need to wait until other
replications are run
Everything Different
Changes
Different
Identical
Experiment
Elements
One Change at a Time
Changes
Different
Iqual
Experiment
Elements
Increasing Changes
Changes
Different
Iqual
Experiment
Elements
Question 4
What is the level of similarity that
results must have to be
considered as reproduced?
Understanding the Natural
Variation
 Identical replications are useful for
understanding the range of variability of
results
 This provides an estimate for other
experimenters to use as a baseline when
they replicate the experiment
Question 5
Should replications reuse
materials?
Accomodating Opposing Views
 The possible threat of errors being propagated
by experimenters exchanging materials does not
mean discarding replications sharing materials
 Replications with identical materials and protocols
(and possibly the same errors) are a necessary step
for verifying that an exact replication run by others
reproduces the same results
 Other replications that alter the design and other
protocol details should be performed in order to
assure that the results are not induced by the protocol
Accomodating Opposing Views
 Replication functions accommodate
opposing views within a broader
framework
 Contrary stances are really tantamount to
different types of replication conducted for
different purposes
 Different ways of running replications are
useful for gradually advancing towards
verified experimental results
Summarizing
Main Ideas
 Replication in ESE
 Replication plays an essential role in the experimental paradigm
 Replication is not a regular practice in ESE today
 More methodological research on the adoption and tailoring of
replication in ESE is still necessary
 Clarifying conceptions
 Replication is necessary not merely to validate findings, but,
more importantly, to discover the range of conditions under
which the findings hold
 Replications provide different knowledge depending on the
changes to the baseline experiment
 Knowledge gained from a replication needs to relate changes
and findings
Towards Understanding
the Replication of
SE Experiments
Natalia Juristo
Universidad Politecnica de Madrid (Spain)
&
University of Oulu (Finland)
ESEM Conference Baltimore (USA) October 11th, 2013

Más contenido relacionado

Similar a Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

Reproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research ObjectsReproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research ObjectsTimothy McPhillips
 
Gemechu keneni(PhD) document
Gemechu keneni(PhD) documentGemechu keneni(PhD) document
Gemechu keneni(PhD) documentgetahun bekana
 
Research design and experimentation
Research design and experimentationResearch design and experimentation
Research design and experimentationDr NEETHU ASOKAN
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniquesYoga Pratama Putra
 
Ruminations on replication
Ruminations on replicationRuminations on replication
Ruminations on replicationUlrich Dirnagl
 
Parameter Optimization of Shot Peening Process of PMG AL2024 Alloy Cover
Parameter Optimization of Shot Peening Process of PMG AL2024 Alloy CoverParameter Optimization of Shot Peening Process of PMG AL2024 Alloy Cover
Parameter Optimization of Shot Peening Process of PMG AL2024 Alloy CoverIOSRJMCE
 
Scientific method powerpoint.php
Scientific method powerpoint.phpScientific method powerpoint.php
Scientific method powerpoint.phpaimorales
 
G1.suntasig.guallichico.maicol.alexander.english.project.design
G1.suntasig.guallichico.maicol.alexander.english.project.designG1.suntasig.guallichico.maicol.alexander.english.project.design
G1.suntasig.guallichico.maicol.alexander.english.project.designMaicol Suntasig
 
I/O chapter 2 by Jason Manaois
I/O chapter 2 by Jason ManaoisI/O chapter 2 by Jason Manaois
I/O chapter 2 by Jason ManaoisRoi Xcel
 
INVITED EDITORIAL Lets Do It Again A Call for Replications
INVITED EDITORIAL Lets Do It Again A Call for Replications INVITED EDITORIAL Lets Do It Again A Call for Replications
INVITED EDITORIAL Lets Do It Again A Call for Replications TatianaMajor22
 
Experimental Design
Experimental DesignExperimental Design
Experimental DesignDavid Genis
 
There’s More Than One Way to Conduct a Replication StudyBey.docx
There’s More Than One Way to Conduct a Replication StudyBey.docxThere’s More Than One Way to Conduct a Replication StudyBey.docx
There’s More Than One Way to Conduct a Replication StudyBey.docxrhetttrevannion
 
Malec, T. & Newman, M. (2013). Research methods Building a kn.docx
Malec, T. & Newman, M. (2013). Research methods Building a kn.docxMalec, T. & Newman, M. (2013). Research methods Building a kn.docx
Malec, T. & Newman, M. (2013). Research methods Building a kn.docxcroysierkathey
 
Power and sample size calculations for survival analysis webinar Slides
Power and sample size calculations for survival analysis webinar SlidesPower and sample size calculations for survival analysis webinar Slides
Power and sample size calculations for survival analysis webinar SlidesnQuery
 

Similar a Towards Understanding SE Experiments Replication (ESEM'13 Keynote) (20)

Experimental Research
Experimental ResearchExperimental Research
Experimental Research
 
Reproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research ObjectsReproducibility by Other Means: Transparent Research Objects
Reproducibility by Other Means: Transparent Research Objects
 
CSMR11a.ppt
CSMR11a.pptCSMR11a.ppt
CSMR11a.ppt
 
Research methodology
Research methodologyResearch methodology
Research methodology
 
Gemechu keneni(PhD) document
Gemechu keneni(PhD) documentGemechu keneni(PhD) document
Gemechu keneni(PhD) document
 
Research design and experimentation
Research design and experimentationResearch design and experimentation
Research design and experimentation
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniques
 
Ruminations on replication
Ruminations on replicationRuminations on replication
Ruminations on replication
 
Experimental Research
Experimental ResearchExperimental Research
Experimental Research
 
Russo Eipe Sept09
Russo Eipe Sept09Russo Eipe Sept09
Russo Eipe Sept09
 
Parameter Optimization of Shot Peening Process of PMG AL2024 Alloy Cover
Parameter Optimization of Shot Peening Process of PMG AL2024 Alloy CoverParameter Optimization of Shot Peening Process of PMG AL2024 Alloy Cover
Parameter Optimization of Shot Peening Process of PMG AL2024 Alloy Cover
 
Scientific method powerpoint.php
Scientific method powerpoint.phpScientific method powerpoint.php
Scientific method powerpoint.php
 
G1.suntasig.guallichico.maicol.alexander.english.project.design
G1.suntasig.guallichico.maicol.alexander.english.project.designG1.suntasig.guallichico.maicol.alexander.english.project.design
G1.suntasig.guallichico.maicol.alexander.english.project.design
 
I/O chapter 2 by Jason Manaois
I/O chapter 2 by Jason ManaoisI/O chapter 2 by Jason Manaois
I/O chapter 2 by Jason Manaois
 
INVITED EDITORIAL Lets Do It Again A Call for Replications
INVITED EDITORIAL Lets Do It Again A Call for Replications INVITED EDITORIAL Lets Do It Again A Call for Replications
INVITED EDITORIAL Lets Do It Again A Call for Replications
 
Experimental Design
Experimental DesignExperimental Design
Experimental Design
 
There’s More Than One Way to Conduct a Replication StudyBey.docx
There’s More Than One Way to Conduct a Replication StudyBey.docxThere’s More Than One Way to Conduct a Replication StudyBey.docx
There’s More Than One Way to Conduct a Replication StudyBey.docx
 
Malec, T. & Newman, M. (2013). Research methods Building a kn.docx
Malec, T. & Newman, M. (2013). Research methods Building a kn.docxMalec, T. & Newman, M. (2013). Research methods Building a kn.docx
Malec, T. & Newman, M. (2013). Research methods Building a kn.docx
 
Power and sample size calculations for survival analysis webinar Slides
Power and sample size calculations for survival analysis webinar SlidesPower and sample size calculations for survival analysis webinar Slides
Power and sample size calculations for survival analysis webinar Slides
 
Csmr11a.ppt
Csmr11a.pptCsmr11a.ppt
Csmr11a.ppt
 

Más de Natalia Juristo

Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)
Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)
Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)Natalia Juristo
 
The Role of Scientific Method in Software Development
The Role of Scientific Method in Software Development The Role of Scientific Method in Software Development
The Role of Scientific Method in Software Development Natalia Juristo
 
Tester contribution to Testing Effectiveness. An Empirical Research
Tester contribution to Testing Effectiveness. An Empirical ResearchTester contribution to Testing Effectiveness. An Empirical Research
Tester contribution to Testing Effectiveness. An Empirical ResearchNatalia Juristo
 
Software Usability Implications in Requirements and Design
Software Usability Implications in Requirements and DesignSoftware Usability Implications in Requirements and Design
Software Usability Implications in Requirements and DesignNatalia Juristo
 

Más de Natalia Juristo (6)

CESI Keynote English
CESI Keynote EnglishCESI Keynote English
CESI Keynote English
 
PROMISE keynote Juristo
PROMISE keynote JuristoPROMISE keynote Juristo
PROMISE keynote Juristo
 
Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)
Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)
Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)
 
The Role of Scientific Method in Software Development
The Role of Scientific Method in Software Development The Role of Scientific Method in Software Development
The Role of Scientific Method in Software Development
 
Tester contribution to Testing Effectiveness. An Empirical Research
Tester contribution to Testing Effectiveness. An Empirical ResearchTester contribution to Testing Effectiveness. An Empirical Research
Tester contribution to Testing Effectiveness. An Empirical Research
 
Software Usability Implications in Requirements and Design
Software Usability Implications in Requirements and DesignSoftware Usability Implications in Requirements and Design
Software Usability Implications in Requirements and Design
 

Último

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 

Último (20)

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 

Towards Understanding SE Experiments Replication (ESEM'13 Keynote)

  • 1. Towards Understanding the Replication of SE Experiments Natalia Juristo Universidad Politecnica de Madrid (Spain) & University of Oulu (Finland) ESEM Conference Baltimore (USA) October 11th, 2013
  • 2. Scope & Terminology  This talk focuses on the replication of experiments  I will refer to the study whose results we want to check as the baseline experiment
  • 3. Replication Intuitive Definition  Deliberate repetition of research procedures in a second investigation for the purpose of determining if earlier results can be reproduced
  • 4. Content  Replication & the experimental paradigm  State of replication in ESE: practice & theory  Shedding a bit of light  Purposes for replicating  Replication functions  Some answers  Replication limits  Baseline and replication minimum degree of similarity  Admissible changes  Reproduction of results  Threats to reuse materials  Summary
  • 5. Role of Replication in Experimental Paradigm
  • 6. Searching for Regularities  Science does not settle for anecdotes. A scientific law or theory describes a regular occurrence in the world  Regularities existing in reality are identified by reproducing the same event in different replications  The result of one experiment is an isolated event
  • 7. One Result, Three Meanings  Without reproduction of results it is impossible to distinguish whether they  occurred by chance  are artifactual  the event occurs only in the experiment, not in reality  really correspond to a regularity
  • 8. State of Replication in ESE Practice
  • 9. Not Enough Replications  Most SE experiments have not yet been replicated  Two reviews provide empirical data to support this point  Let us look at their results
  • 10. Experiments in Leading Journals & Conferences (1993-2002)  5,453 articles published from 1993 to 2002 in major SE journals and conference proceedings  113 experiments  20 (17.7%) described as replications Sjøberg et al. “A survey of controlled experiments in SE” TSE, 2005
  • 11. All Publications  96 papers reporting replications  133 replications of 72 baseline studies  Any type of empirical study  Quasi-Experiments 35 49%  Controlled Experiments 21 29%  Case Study 15 21%  Survey 1 1% da Silva et al. “Replication of Empirical Studies in Software Engineering Research: A Systematic Mapping Study” EMSE 2013
  • 12. Mostly Internal & Not Stand-Alone Publications [Da Silva et al. 2013] Internal – Original-Included reports Internal – Replication-only reports 1994 1995 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 0 0 0 2 1 1 1 3 1 3 6 1 3 5 2 4 1 3 0 0 3 1 2 2 1 3 11 7 5 2 6 14 1996 0 0 NumberofReplications 2 4 6 8 10 12 14 16 Total 1 5 6 4 5 6 6 4 4 9 15 11 13 9 13 220 18 20 22 External 1 2 4 3 1 4 1 1 0 0 3 1 3 5 3 70
  • 13. First Paper Published on a Replication [Da Silva et al. 2012]  In SE, the first article that explicitly reported a replication of an empirical study was published in 1994  Daly, Brooks, Miller, Roper & Wood Verification of Results in Sw Maintenance Through External Replication Intl Conf on Software Maintenance
  • 14. EMSE Special Issue on Replication  The large number of submissions was admittedly more than we expected  We received a total of 16 submissions  Encouraging the publication of replications will foster researchers to replicate more studies
  • 15. State of Replication in ESE Theory
  • 16. First Theoretical Publication on Replication  In 1999, a paper discussed a framework to organize sets of related experiments (families) and the generation of knowledge from such sets  Basili, Shull & Lanubile. Building knowledge through families of experiments. TSE  Is a family exactly a set of replications?  …experiments can be viewed as part of common families of studies, rather than being isolated events…
  • 17. More Activity in the Last 10 Years  Shull, Basili, Carver, Maldonado, Travassos, Mendonça & Fabbri Replicating software engineering experiments: Addressing the tacit knowledge problem. ISESE 2002  Vegas, Juristo, Moreno, Solari & Letelier Analysis of the influence of communication between researchers on experiment replication. ISESE 2006  Brooks, Roper, Wood, Daly & Miller Replication’s role in software engineering. Guide to Advanced Empirical SE. Springer 2008  Juristo & Vegas Using differences among replications of software engineering experiments to gain knowledge. ESEM 2009 [Juristo & Vegas The Role of Non- Exact Replications in SE EMSE Journal 2011]  Krein & Knutson A Case for replication: Synthesizing research methodologies in SE. RESER 2010  Gómez, Juristo & Vegas Replications types in experimental disciplines. ESEM 2010  Juristo, Vegas, Solari, Abrahao & Ramos A Process for Managing Interaction between Experimenters to Get Useful Similar Replications IST 2013
  • 18. State of the Theory  There is no agreement yet on terminology, typology, purposes, operation and other replication issues  There is not even agreement on what a replication is!!  Different authors consider different types of changes to the baseline experiment as admissible
  • 19. Example of Divergent Views  Some researchers advise the use of different protocols and materials to preserve independence and prevent error propagation in replications by using the same configuration  Kitchenham The role of replications in ESE - a word of warning EMSE 2008  Other researchers recommend the reuse of materials to assure that replications are similar enough for results to be comparable  Shull, Carver, Vegas & Juristo The role of replications in ESE EMSE 2008
  • 20. Shedding some Light on Replication Role of Replication
  • 21. The Two Roles of Replication  Validation  Learning
  • 22. Learning Relevant Conditions  As more replications of Thompson and McConnell’s baseline experiment were run  different conditions influencing the results of this experiment were identified  After several hundred experiments had been run  experimenters managed to identify around 70 conditions influencing the behavior of this type of invertebrate
  • 23. Which are the Important Variables?  “…In fact, the principle of Transversely Excited Atmospheric (TEA) lasers, scientists did not know that the inductance of the top was important” A physicist quoted in Changing Order: Replication and Induction in Scientific Practice Harry Collins 1992
  • 24. First Learn, Then Validate  “In the early stages, failure to get the expected results is not falsification but a step in the discovery of some interfering factor. For immature experimental knowledge, the first step is … to find out which experimental conditions should be controlled” Validity and the Research Process Brinberg and McGrath 1985
  • 25. SE Problems with Identical Replications  SE has tried to repeat experiments identically, but no exact replications have yet been achieved  The complexity of the software development setting prevents the many experimental conditions from being reproduced identically  Yet this is a regular rather than an exceptional situation
  • 26. In the Beginning Most is Unkown  “Most aspects are unknown when we start to study a phenomenon experimentally. Even the tiniest change in a replication can lead to inexplicable differences in the results” Validity and the Research Process Brinberg and McGrath 1985
  • 27. Start with Similar Replications  “The less that is known about an area the more power a very similar experiment has ... This is because, in the absence of a well worked out set of crucial variables, any change in the experiment configuration, however trivial in appearance, may well entail invisible but significant changes in conditions” Changing Order: Replication and Induction in Scientific Practice Harry Collins 1992
  • 28. Learning & Validation Process 1. Start with identical replications  At the beginning of experimental research, equality, even if targeted, will not happen  There will be either invisible but significant changes in conditions or induced changes due to context adaptation or both  Failure to get the expected results should not be construed as falsification, but as a step towards the discovery of some new factor 2. Later on, both knowledge discovery and testing can be more systematic  Changes in the configuration will be made purposely to learn more variables and rule out artifactual results
  • 29. Learning is Even More Important  Replication is needed not merely to validate one’s findings, but more importantly, to establish the increasing range of radically different conditions under which the findings hold, and the predictable exceptions The design of replicated studies. American Statistician Lindsay and Ehrenberg 1993
  • 30. Shedding some Light on Replication Functions of Replication
  • 31. Reminder of Experimental Setting  Operationalization  Treatments  Response variable  Protocol  Experimental design  Experimental objects  Guides  Measuring instruments  Data analysis techniques  Population  Objects  Subjects
  • 32. Verification Functions  Control experimental errors  Control protocol independence  Understand operationalization limits  Understand population limits
  • 33. Control Experimental Errors  Verify that the results of the baseline experiment are not a chance product of an error  All elements of the experiment must resemble the baseline experiment as closely as possible  Collateral benefit  Provide an understanding of the natural (random) variation of the observed results  critical for being able to decide whether or not results hold in dissimilar replications
  • 34. Control Protocol Independence  Verify that the results of the baseline experiment are not artifactual  An artifactual result is due to the experimental configuration and cannot be guaranteed to exist in reality  The experimental protocol needs to be changed for this purpose  If an experiment is replicated several times using the same materials, the observed results may occur due to the materials  The same applies for all protocol elements
  • 35. Understanding Operationalization Limits  Learn how sensitive results are to different operationalizations  Treatment operationalizations  treatment application procedures, treatment instructions, resources, treatment transmission …  Effect operationalizations  Metrics, measurement procedures …
  • 36. Understand Population Limits  Learn the extent to which results hold for other subject types or other types of experimental objects  Learn to which specific population the experimental sample belongs and what the characteristics of such population are
  • 37. Changes & Replication Functions Experimental Configuration Control Experimental Error Control Protocol Independence Understand Operationalization Limits Understand Population Limits Operationalization = = ≠ = Population = = = ≠ Protocol = ≠ = = Function of Replication LEGEND: = the element is equal to, or as similar as possible to, the baseline experiment ≠ the element varies with respect to the baseline experiment
  • 39. Question 1 What exactly is a replication study?
  • 40. Establishing Limits Amount of Changes Run Same data different models or statistical methods Identical same site & researchers + Protocol changes Operationa- lization changes Population changes Just the hypothesis kept RE-ANALYSIS REPETITION REPLICATION REPRODUCTION Not Run
  • 41. Question 2 What level of similarity should an experiment have to be considered a replication rather than a new experiment?
  • 42. Unchanged elements of a replication  A replication must share the hypothesis with the baseline experiment  Same response variable  although not same metric  Same treatments  although not same operationalization  at least two treatments in common
  • 43. Partials Replications Exp. A (Baseline OR Replication) Exp. B (Replication OR Baseline) Exp. D Replication C RV3 RV2 RV1 T5 T4 T3 T2 T6 T1
  • 44. Question 3 What changes are acceptable?
  • 45. Levels of Verification  Similarity between the baseline experiment and a replication serve different verification purposes depending on the changes made  Replicating an experiment as closely as possible  Verifies results are not accidental  Varying the experimental protocol  Verifies results are not artifactual  Varying the population properties  Verifies types of populations for which the results hold  Varying the operationalization  Verifies range of operationalizations for which the results hold
  • 46. Can I Change Everything?  It is better not to change everything at the same time  We can understand the source of differences in results better if only one change is made at a time  But a replication with a lot of changes is not rendered useless or doomed to failure  We will just need to wait until other replications are run
  • 48. One Change at a Time Changes Different Iqual Experiment Elements
  • 50. Question 4 What is the level of similarity that results must have to be considered as reproduced?
  • 51. Understanding the Natural Variation  Identical replications are useful for understanding the range of variability of results  This provides an estimate for other experimenters to use as a baseline when they replicate the experiment
  • 52. Question 5 Should replications reuse materials?
  • 53. Accomodating Opposing Views  The possible threat of errors being propagated by experimenters exchanging materials does not mean discarding replications sharing materials  Replications with identical materials and protocols (and possibly the same errors) are a necessary step for verifying that an exact replication run by others reproduces the same results  Other replications that alter the design and other protocol details should be performed in order to assure that the results are not induced by the protocol
  • 54. Accomodating Opposing Views  Replication functions accommodate opposing views within a broader framework  Contrary stances are really tantamount to different types of replication conducted for different purposes  Different ways of running replications are useful for gradually advancing towards verified experimental results
  • 56. Main Ideas  Replication in ESE  Replication plays an essential role in the experimental paradigm  Replication is not a regular practice in ESE today  More methodological research on the adoption and tailoring of replication in ESE is still necessary  Clarifying conceptions  Replication is necessary not merely to validate findings, but, more importantly, to discover the range of conditions under which the findings hold  Replications provide different knowledge depending on the changes to the baseline experiment  Knowledge gained from a replication needs to relate changes and findings
  • 57. Towards Understanding the Replication of SE Experiments Natalia Juristo Universidad Politecnica de Madrid (Spain) & University of Oulu (Finland) ESEM Conference Baltimore (USA) October 11th, 2013