Testing survey by_directions

Questions Unsolved

1. 现在的安排是怎样的？
a) 对笔记本上的 idea 汇总和总结。
b) Xinming Wang 对 code omission 的分类！！！
（这里与周老师的小程序目的一样。 Wang 的方法不会对所有子分类都好）
c) 阅读可能会引用到的文章。
d) 他们的 Implementation 的 framework 是怎样的？
e) 理清 design，列出所有有可能的实验方法
f) 逐个完成实验，提供论据。
2. 目标是什么呢？
a) 开题：Our Approach

1. Evaluation 的方法和数据集
a) Siemens
b) Unix
c) 还有其他的数据集么？
d) Java 的程序集通常是哪些？
i. NanoXML
2. 熟悉 fault，对 fault 进行分类
a) Coincidental correctness
b) Code omission
c) Multi-Fault
d)
3. 熟悉用例的 run
a) run 信息分类
i. sum{covered statement}
ii. 覆盖
iii. 次数
iv. Trace
v. 语意
vi. Slice
vii. State
viii. Predicates
ix. Symbolic execution
x. PDG

xi. AST
xii. CFG
b) 做统计图：分布，方差，均值。
4. 熟悉 test case
a) 如何判断测试用例的距离，相似性？
i. 输入
ii. 覆盖
iii. 覆盖句子个数
iv. 覆盖次数的相似度
b) 如何判断一个测试用例更容易找出错误？
i. 测试用例的评价。
5. 学习使用工具
a) gcov
b) weka
c) eclipse plugins
6. 努力的方向
a) Socket 通信的 Fault Localization
b) 循环中的 Fault Localization(例如<=3 写成<3)
c) 递归中的 Fault Localization
d) 测试用例充分与否（正确测试用例只需 20 个左右的那篇文章）
e) 整体上提升，提出一个新公式
f) 标记出不同种类的错误
g) 针对特殊的错误提出一种公式
h) 删除部分相似的测试用例
i) 聚类后建立一个逻辑模型（使用 run.covered_statements.length 来聚类）
j) 谓语子句的逻辑组合覆盖？程序覆盖信息的本质在于条件判断语句的分支覆盖。
k) 错误在疑似谓语子句的向上切片 slice 中
l) 一个 failed 的 run 的没有执行的语句都是正确语句
7. 总结目前的方向
a) Run
i. 赋予权重给每个 run
ii. 对 run 进行聚类
iii. 删除部分极其相似的 run
iv. 删除部分极有可能是 coincidental correctness 的 run。(删除与 failed run 最接近
的 passed run)
v. 对 run 覆盖的语句做集合运算：交（大权），并（小权），补（负权）。
Passed-Covered
Passed-Uncovered
Failed-Covered
Failed-Uncovered
b) 谓语的逻辑组合覆盖，再用 slicing 的方法（这算是 CBFL 与 slice 的结合）
这是因为覆盖的本质在于，条件判断语句（不过条件的结果受之前的赋值语句影
响）

Questions Solved

1.

Questions

1. What is the background?
2. What assumptions do these approaches based?
3. Can you tell me what the best approach is in this area? Who proposed it?
4. Can you list some motivation examples for the approach?
5. The ideas are trivial. What is the biggest challenge in these approaches?
6. What is the approach’s IPO (Input-Process-Output)? Can you give me an example?
7. What are the paper’s contributions?
8. The result is better. Can you explain it? What is different from related work?
9. How to evaluate in this area, including methods, benchmarks and convincing reasons?
10. Can you find the design space of this area?
11. What can we learn from the author’s survey?
12. Can we make some breakthroughs? What’s our future work?

Test

[]

Annotation

Test Case Generation

[McM04] Search-based software test data generation: a

survey

McMinn, P. (2004), Search-based software test data generation: a survey. Software Testing,
Verification and Reliability, 14: 105–156.

Annotation

The paper gives us a fairly comprehensive overview of search-based test generation. The author
firstly introduces the motivation of automated test, as well as the problems researchers have to
face. In the second chapter of the paper, several general search techniques is proposed .Through
the next 4 chapters, the author classifies the different types of search-based test generation. The
classification is based on the different type of testing, which is known as structural testing,
functional testing, grey-box testing and non-functional testing. The author also classifies more
details for each testing type, with a number of comprehensive examples. The classification is
impressive and helps a lot to understand each research’s position.

Keyword

search-based software engineering; automated software test data generation;

Abstract

Background

The use of metaheuristic search techniques for the automatic generation of test data has been a
burgeoning interest for many researchers in recent years.

Motivation

Previous attempts to automate the test generation process have been limited, having been
constrained by the size and complexity of software, and the basic fact that, in general, test data
generation is an undecidable problem.

Solution

Metaheuristic search techniques offer much promise in regard to these problems. Metaheuristic
search techniques are high-level frameworks, which utilize heuristics to seek solutions for
combinatorial problems at a reasonable computational cost. To date, metaheuristic search
techniques have been applied to automate test data generation for structural and functional testing;
the testing of grey-box properties, for example safety constraints; and also non-functional
properties, such as worst-case execution time.

Contribution

This paper surveys some of the work undertaken in this field, discussing possible new future
directions of research for each of its different individual areas.

52 High Normal 2010.09.24

Characterization Analytic Model Persuasion

[Edv99] A survey on automatic test data generation

Jon Edvardsson. A survey on automatic test data generation. In Proceedings of the Second
Conference on Computer Science and Engineering in Linköping (October 1999), pp. 21-28.

Annotation

A program-based test data generator is one component to automate software testing. The paper
begins by showing the architecture of a typical test data generator system and some basic
concepts, such as control flow graph, basic block, and branch predicate. In the next chapter, the
author classifies the Test Data Generators into four kinds: Static and Dynamic Test Data
Generation, Random Test Data Generation, Goal-Oriented Test Data Generation, and Path-

Oriented Test Data Generation. The author also discusses some problems of test data generation,
which involve Arrays and Pointers, Objects, Loops, Modules, Infeasible Paths, Constraint
Satisfaction, Oracle.

Keyword

Program-based Test Generation

Abstract

Outline

1. Introduction
2. Basic Concepts
3. An Automatic Test Data Generator System
a) The Test Data Generator
i. Static and Dynamic Test Data Generation
ii. Random Test Data Generation
iii. Goal-Oriented Test Data Generation
iv. Path-Oriented Test Data Generation
b) The Path Selector’s path criteria
i. Statement coverage
ii. Branch coverage
iii. Condition coverage
iv. Multiple-condition coverage
v. Path coverage
4. Problems of Test Data Generation
a) Arrays and Pointers
b) Objects
c) Loops
d) Modules
e) Infeasible Paths
f) Constraint Satisfaction
g) Oracle

Background

In order to reduce the high cost of manual software testing and at the same time to increase the
reliability of the testing processes researchers and practitioners have tried to automate it. One of
the most important components in a testing environment is an automatic test data generator - a
system that automatically generates test data for a given program.

Motivation

The focus of this article is program-based generation, where the generation starts from the actual
programs.

Solution

In this article I present a survey on automatic test data generation techniques that can be found in
current literature.

Contribution

Basic concepts and notions of test data generation as well as how a test data generator system
works are described. Problems of automatic generation are identified and explained. Finally
important and challenging future research topics are presented.

8 Normal Normal 2010.09.24


[GGJ+10] Test generation through programming in UDITA

Milos Gligoric, Tihomir Gvero, Vilas Jagannath, Sarfraz Khurshid, Viktor Kuncak, Darko
Marinov. Test generation through programming in UDITA. Proceedings of the 32nd ACM/IEEE
International Conference on Software Engineering - Volume 1, ICSE 2010, Cape Town, South
Africa, 1-8 May 2010.

Annotation

Generating test input on complex data structures is time-consuming and results in test suites that
have poor quality and difficult to reuse. The author present a new language for describing tests,
UDITA, a Java-based language with non-deterministic choice operators and an interface for
generating linked structures. We can learn these tradeoffs below in this area: how easy to write the
specification, how fast to generate tests (efficiency), how good the tests are (effectiveness) and
how complex the tests are.

Keyword

test input generation; specification-based;

Abstract

Background

The consequences of software bugs become more severe, while widely adopted testing tools offer
little support for test generation.

Motivation

Practical application of these techniques were largely limited to testing units of code much smaller
than hundred thousand lines, or generating input values much simpler than representations of Java
programs. It means these techniques can not generate inputs with complex data structures. The
experiments show that test generation using UDITA is faster and leads to test descriptions that are
easier to write than in previous frameworks.

Solution

The author presents an approach for describing tests using nondeterministic test generation
programs. The author introduces UDITA, a Java-based language with non-deterministic choice
operators and an interface for generating linked structures. Further more, the author describe new
algorithms to generate tests and implemented their approach based on Java PathFinder (JPF).

Contribution

1. New language for describing tests
2. New test generation algorithms
3. Implementation
4. Evaluation

Evaluation

The author evaluated UDITA with four sets of experiments, three for black-box testing and one for
white-box. The first set of experiments, on six data structures, which are DAG, HeapArray,
NQueens, RBTree, SearchTree and SortedList, compares base JPF test generation. The second set
of experiments, on testing refactoring engines, compares UDITA with ASTGen. The third set of
experiments uses UDITA to test parts of the UDITA implementation itself. For white-box testing,
the forth set of experiments compares UDITA with symbolic execution in Pex. The experiments
show that test generation using UDITA is faster and leads to test descriptions that are easier to
write than in previous frameworks.

10 Normal Normal 2010-10-06

Method/Means Technique Analysis, Experience

[GGJ+09] On test generation through programming in UDITA

M. Gligoric, T. Gvero, V. Jagannath, S. Khurshid, V. Kuncak, and D. Marinov. On test generation
through programming in UDITA. Technical Report LARA-REPORT-2009-05, EPFL, Sep. 2009.

14 Normal Normal 2010-10-06

Method/Means Technique Analysis, Experience

Annotation

This is the Technical Report version of [GGJ+10], which offers more references, links and graphs
without the page limit.

[BKM02] Korat: Automated testing based on Java predicates

Boyapati, C., Khurshid, S., and Marinov, D. 2002. Korat: automated testing based on Java
predicates. In Proceedings of the 2002 ACM SIGSOFT international Symposium on Software
Testing and Analysis (Roma, Italy, July 22 - 24, 2002). ISSTA '02. ACM, New York, NY, 123-133.

Annotation

A novel framework for test generation is proposed in this paper. Korat uses the method
precondition writing in JML to automatically generate nonisomorphic test cases. Key techniques
in Korat are monitoring the predicate’s executions, pruning portions with structural invariants and
generating only nonisomorphic inputs. The evaluation in this area usually involves the time of
generation, the correctness and effectiveness of generated tests.

Keyword

specification-based testing

Abstract

Background

Manual software testing and test data generation are labor-intensive processes. Korat uses
Specification-based testing.

Motivation

Can we use precondition to generate test cases and postcondition to check the correctness of
output?

Solution

Korat exhaustively explores the bounded input space of the predicate. However, Korat also
monitor the predicate’s executions and pruning portions of the search space. Korat uses the Java
Modeling Language (JML) for specifications.

Contribution

1. A technique for automatic test case generation: given a predicate, and a bound on the size of
its inputs, Korat generates all nonisomorphic inputs for which the predicate returns true.
2. Korat uses backtracking to systematically explore the bounded input space of the predicate.
3. Korat monitors accesses that the predicate makes to all the fields of the candidate input to
prune large portions of the search space.

Evaluation

This paper present Korat’s performance, then compare Korat with Alloy Analyzer for test case
generation. The benchmarks are BinaryTree, HeapArray, LinkedList, TreeMap, HashSet and
AVTree. Some of them come from standard Java libraries. The comparison with Alloy Analyzer
includes the number of structures and the time to generate them.

11 High Well 2010.10.06

Method/Means Technique Analysis

[KM04] TestEra: Specification-Based Testing of Java

Programs Using SAT

Sarfraz Khurshid, Darko Marinov. TestEra: Specification-Based Testing of Java Programs Using
SAT. 403-434 2004 11 Autom. Softw. Eng. 4

Annotation

This paper proposed a framework for automated speciﬁcation-based testing of Java programs.
Instead of JML [BKM02], the author took Alloy to express the specification of the pre- and post-
conditions of that method. Since the Alloy is a first-order declarative language, the author attempt
to use SAT solver to generate the test cases. The key idea behind TestEra is to automate testing of
Java programs, requiring only that the structural invariants of inputs and the correctness criteria
for the methods be formally specified.

Keyword

test generation

Abstract

Background

TestEra is a framework for automated specification-based testing of Java programs.

Motivation

The search space is huge and nonisomorphism is hard. In addition, enumeration of structurally
complex data is not efficient.

Solution

TestEra requires as input a Java method (in source code or byte code), a formal specification of the
pre- and post-conditions of that method, and a bound that limits the size of the test cases to be
generated, expressed in Alloy, a first-order declarative language based on sets and relations. Using
the method’s pre-condition, TestEra automatically generates all nonisomorphic test inputs up to
the given bound. It executes the method on each test input, and uses the method postcondition as
an oracle to check the correctness of each output. Due to the first-order specification, the author
uses SAT solvers to help solve the problem. The key idea behind TestEra is to automate testing of
Java programs, requiring only that the structural invariants of inputs and the correctness criteria
for the methods be formally specified. The framework is shown as below.

Evaluation

The author collected, for each case study, the method we test, a representative input size, and the
phase 1 (i.e., input generation) and phase 2 (i.e., correctness checking) statistics of TestEra’s
checking for that size. The case study include singly linked lists, red black trees, INS (Information
Network System) and Alloy-alpha Analyzer.


Method/Means Technique Experience

Symbolic execution

[Kin76] Symbolic execution and program testing

King, J. C. 1976. Symbolic execution and program testing. Commun. ACM 19, 7 (Jul. 1976),
385-394.

Annotation

This paper is the Most Cited Paper in symbolic execution. The author attempts to introduce some
basic notions of this program analysis technique. The main difficulty in symbolic execution is
conditional branch type statements. The paper takes a simple programming language (PL/I) to
analyze the difficulty in details. By using two typical examples, the author introduces the symbolic
execution system based on symbolic execution tree and the strategy to solve conditional branch
problem. Furthermore, this paper discusses the program proving based on symbolic execution.
The symbolic execution accepts symbolic inputs and produce symbolic formulas as output. The
execution semantics is changed for symbolic execution. But neither the language syntax nor the
individual programs written in the language are changed.

Keyword

symbolic execution; program testing

Abstract

Background

Instead of supplying the normal inputs to a program (e.g. numbers) symbolic execution supplies
symbols representing arbitrary values. The execution proceeds as in a normal execution except
that values may be symbolic formulas over the input symbols.

Motivation

The difficult, interesting issues arise during the symbolic execution of conditional branch type

statements.

Solution

A particular system called EFFIGY which provides symbolic execution for program testing and
debugging is also described, it interpretively executes programs written in a simple PL/I style
programming language. It includes many standard debugging features, the ability to manage and
to prove things about symbolic expressions, a simple program testing manager, and a program
verifier.

Evaluation

A brief discussion of the relationship between symbolic execution and program proving is also
included.


Method/Means Technique Persuasion

[DJDM09] ReAssert: Suggesting Repairs for Broken Unit Tests

Brett Daniel, Vilas Jagannath, Danny Dig, Darko Marinov, "ReAssert: Suggesting Repairs for
Broken Unit Tests," ase, pp.433-444, 2009 IEEE/ACM International Conference on Automated
Software Engineering, 2009

Annotation

Software’s changes cause tests to fail. This paper is first published paper to suggest repairs to
failing tests’ code. The key challenge in repairing tests is to retain as much of the original test
logic as possible. The author also proposed several repair strategies: Replace Assertion Method,
Invert Relational Operator, Replace Literal in Assertion, Replace with Related Method, Trace
Declaration-Use Path, Accessor Expansion, Surround with Try-Catch and Custom Repair
Strategies. Notice that the repair only changes the test code (e.g. the code based on JUnit), instead
of the code to be tested.

Keyword

Software testing; Software maintenance

Abstract

Background

Developers often change software in ways that cause tests to fail. When this occurs, developers
must determine whether failures are caused by errors in the code under test or in the test code
itself. In the latter case, developers must repair failing tests or remove them from the test suite.

Motivation

Repairing tests is time consuming but beneficial, since removing tests reduces a test suite's ability
to detect regressions. Fortunately, simple program transformations can repair many failing tests
automatically.

Solution

We present ReAssert, a novel technique and tool that suggests repairs to failing tests' code which
cause the tests to pass. Examples include replacing literal values in tests, changing assertion
methods, or replacing one assertion with several. If the developer chooses to apply the repairs,
ReAssert modifies the code automatically.

Contribution

This paper makes contributions in Idea, Technique, Tool and Evaluation.

Evaluation

First, we describe two case studies in which researchers used ReAssert to repair failures in their
evolving software.
Second, we perform a controlled user study to evaluate whether ReAssert’s suggested repairs

match developers’ expectations.
Third, we assess ReAssert’s ability to suggest repairs for failures in open-source projects,
considering both manually written and automatically generated test suites.



[PV09] A survey of new trends in symbolic execution for

software testing and analysis

Corina S. Păsăreanu, Willem Visser. A survey of new trends in symbolic execution for software
testing and analysis. 339-353 2009 11 STTT 4

Annotation

Symbolic execution is an analysis technique which takes program as input, and output the
symbolic execution tree. A comprehensive overview of symbolic execution is given. By giving
some simple and classical examples, the author first introduces the basic notions and challenges of
symbolic execution. Secondly, the trend to combine concrete and symbolic execution is discussed.
Thirdly, the author introduces how researchers tried to solve scalability issues when facing large
programs, which is still the main obstacle against widespread application of symbolic execution.
Furthermore, the author gives an overview of the application of symbolic execution techniques,
such as test case generation, proving program properties and static detection of runtime error. In
the “future directions” part, the author discusses the main obstacle and possible solutions in this
area, e.g. new heuristic searches, extending the abstraction of programs and powerful decision
procedures for combinations of theories.

Keyword

symbolic execution; survey

Abstract

Background

Symbolic execution is a well-known program analysis technique which represents program inputs
with symbolic values instead of concrete, initialized, data and executes the program by
manipulating program expressions involving the symbolic values.

Motivation

Symbolic execution has been proposed over three decades ago but recently it has found renewed
interest in the research community, due in part to the progress in decision procedures, availability
of powerful computers and new algorithmic developments.

Solution

We provide here a survey of some of the new research trends in symbolic execution, with
particular emphasis on applications to test generation and program analysis.

Contribution

We first describe an approach that handles complex programming constructs such as input
recursive data structures, arrays, as well as multithreading. Furthermore, we describe recent hybrid
techniques that combine concrete and symbolic execution to overcome some of the inherent
limitations of symbolic execution, such as handling native code or availability of decision
procedures for the application domain.
We follow with a discussion of techniques that can be used to limit the (possibly infinite) number
of symbolic configurations that need to be analyzed for the symbolic execution of looping
programs. Finally, we give a short survey of interesting new applications, such as predictive
testing, invariant inference, program repair, analysis of parallel numerical programs and
differential symbolic execution.

Evaluation



[KPV03] Generalized symbolic execution for model checking

and testing

Khurshid, S., Păsăreanu, C. S., and Visser, W. 2003. Generalized symbolic execution for model
checking and testing. In Proceedings of the 9th international Conference on Tools and Algorithms
For the Construction and Analysis of Systems (Warsaw, Poland, April 07 - 11, 2003). H. Garavel
and J. Hatcliff, Eds. Lecture Notes In Computer Science. Springer-Verlag, Berlin, Heidelberg,
553-568.

Annotation

This paper proposes one of the early approaches focusing on Symbolic Execution on concurrent
programs and complex data structures. This paper presents a novel framework based on two-fold
symbolic execution. First, the paper defines a source translation instrument, which enables
standard model checkers to perform symbolic execution. For the purpose of handling dynamically
allocated structures, method preconditions, data and concurrency, this paper give a novel symbolic
execution algorithm.

Keyword

symbolic execution

Abstract

Background

Modern software systems, which often are concurrent and manipulate complex data structures,
must be extremely reliable.

Motivation

We need to automate checking of such systems, which are concurrent and manipulate complex
data structures.

Solution

We provide a two-fold generalization of traditional symbolic execution based approaches. First,
we define a source to source translation to instrument a program, which enables standard model
checkers to perform symbolic execution of the program. Second, we give a novel symbolic
execution algorithm that handles dynamically allocated structures (e.g., lists and trees), method
preconditions (e.g., acyclicity), data (e.g., integers and strings) and concurrency.

Contribution

1. To address the state space explosion problem.
2. To achieve modularity.
3. To check strong correctness properties of concurrent programs.
4. To exploit the model checker’s built-in capabilities

Evaluation

By introducing the implementation and illustrating two applications of the framework, the author
persuades the availability of this approach.

16 High Well 2010.10.09


[PV04] Verification of Java programs using symbolic

execution and invariant generation

C. S. Păsăreanu, W. Visser. Verification of Java Programs Using Symbolic Execution and Invariant
Generation. Lecture Notes in Computer Science, Vol. 2989, pp. 164-181, 2004.

Annotation

Software verification is recognized as an important and difficult problem. However, it suffers from
the state-explosion problem and can only deal with closed systems. This paper proposes a
framework uses method specifications and loop invariants to solve the problem. This paper also
illustrates some non-trivial examples, in which case they can benefit from the more powerful
approximation techniques.

Keyword

symbolic execution; method specifications; loop invariants

Abstract

Background

Software verification is recognized as an important and difficult problem.

Motivation

Model checking typically can only deal with closed systems and it suffers from the state-explosion
problem.

Solution

In order to solve the state-explosion problem, we present a novel framework, based on symbolic
execution, for the automated verification of software. The framework uses annotations in the form

of method specifications and loop invariants. We present a novel iterative technique that uses
invariant strengthening and approximation for discovering these loop invariants automatically.

Contribution

1. A novel verification framework that combines symbolic execution and model checking.
2. A new method for iterative invariant generation.
3. A series of (small) non-trivial Java examples showing the merits of our method.

Evaluation

By showing some non-trivial Java examples, we compare our work with the invariant generation
method presented in another paper [C. Flanagan and S. Qadeer. Predicate abstraction for software
verification. In Proc. POPL, 2002.].



Fault Localization

[WD10] Software Fault Localization

W. Eric Wong, Vidroha Debroy. "Software Fault Localization," IEEE Reliability Society 2009
Annual Technology Report, January 2010

Annotation

This article gives a fairly comprehensive overview of Software Fault Localization. After
introducing basic notions and classical ways of fault localization, this article classifies the
advanced fault localization techniques as follows: Static, Dynamic, and Execution Slice-Based

Techniques, Program Spectrum-based Techniques, Statistics-based Techniques, Program State-
based Techniques, Machine Learning-based Techniques, etc. Furthermore, important aspects of
fault localization are given, namely, Effectiveness, efficiency, and robustness; Impact of Test
Cases; Faults introduced by missing code; lastly, Programs with multiple bugs; which could be
regarded as the design space for future work.

Keyword

Fault Localization

Abstract

Background

Regardless of the effort spent on developing a computer program, it may still contain bugs. In fact,
the larger, more complex a program, the higher the likelihood of it containing bugs.

Motivation

It is always challenging for programmers to effectively and efficiently remove bugs, while not
inadvertently introducing new ones at the same time.

Solution

Automatic fault localization techniques can guide programmers to the locations of faults with
minimal human intervention.

6 High Well 2010.10.10

Characterization Analytic Model Experience

Web

[ADT+10] Practical fault localization for dynamic web

applications

Artzi, S., Dolby, J., Tip, F., and Pistoia, M. 2010. Practical fault localization for dynamic web
applications. In Proceedings of the 32nd ACM/IEEE international Conference on Software
Engineering - Volume 1 (Cape Town, South Africa, May 01 - 08, 2010). ICSE '10. ACM, New
York, NY, 265-274.

Annotation

In this Paper, an automatic fault localization technique is proposed, which first fully finds and
localizes malformed HTML errors in Web applications that execute PHP code on the server side.
This technique is based on the previous work [3, 4] of combined concrete and symbolic execution
to Web applications written in PHP. This technique needn’t an upfront test suite. Furthermore, this
paper defines the statement’s suspiciousness rating in web applications with the use of an output
mapping from statements.

However, the suspiciousness definition as followes is a bit magical, where the suspiciousness
rating and the Tarantula suspiciousness rating are 1.1 and 0.5.

Keyword

Fault Localization

Abstract

Background

Web applications are typically written in a combination of several programming languages. As
with any program, programmers make mistakes and introduce faults, resulting in Web-application
crashes and malformed dynamically generated HTML pages. While malformed HTML errors may
seem trivial, and indeed many of them are at worst minor annoyances.

Motivation

Previous fault-localization techniques need an upfront test suite. And there is no fully automatic
tool that finds and localizes malformed HTML errors in Web applications that execute PHP code
on the server side.

Solution

We leverage combined concrete and symbolic execution and several fault-localization techniques
to create a uniquely powerful tool for localizing faults in PHP applications. The tool automatically
generates tests that expose failures, and then automatically localizes the faults responsible for
those failures.

Contribution

1. We present an approach for fault localization that uses combined concrete and symbolic
execution to generate a suite of passing and failing tests.
2. We demonstrate that automated techniques for fault localization are effective at localizing
real faults in open-source PHP applications.
3. We present 6 fault localization techniques that combine variations on the Tarantula algorithm.
4. We implemented these 6 techniques in Apollo.

Evaluation

This evaluation aims to answer two questions:
1. How effective is the Tarantula fault localization technique in the domain of PHP web
applications?
2. How effective is Tarantula, when combined with the use of an output mapping and/or when
modeling the outcome of conditional expressions in Section 4.

The benchmarks are faqforge, webchess, schoolmate and timeclock. And 6 techniques that
combine variations are used in the experiment.
Note: The author doesn’t know the location of faults, needed to localize them manually. Manually
localizing and fixing faults is a very time-consuming task, so they limited themselves to 20 faults
in each of the subject programs.

10 High Well 2010.10.09


[AKD+10] Finding Bugs in Web Applications Using Dynamic

Test Generation and Explicit-State Model Checking

Shay Artzi, Adam Kieżun, Julian Dolby, Frank Tip, Danny Dig, Amit Paradkar, Michael
D. Ernst, "Finding Bugs in Web Applications Using Dynamic Test Generation and Explicit-State
Model Checking," IEEE Transactions on Software Engineering, vol. 99, no. RapidPosts, pp.
474-494, , 2010

Annotation

This paper enhances the tools and methods in the authors’ previous work [AKD+08]. By
implementing a form of explicit-state software model checking, this paper try to handle user input
options that are created dynamically by a web application, which includes keeping track of
parameters that are transferred from one script to the next.

Keyword

test generation; symbolic execution; explicit-state model checking

Abstract

Background

Web script crashes and malformed dynamically-generated web pages are common errors, and they
seriously impact the usability of web applications.

Motivation

Current tools for web-page validation cannot handle the dynamically generated pages that are
ubiquitous on today’s Internet.
In the previous work, we did not yet supply a solution for handling user input options that are
created dynamically by a web application, which includes keeping track of parameters that are
transferred from one script to the next—either by persisting them in the environment, or by
sending them as part of the call.

Solution

We present a dynamic test generation technique for the domain of dynamic web applications. The
technique utilizes both combined concrete and symbolic execution and explicit-state model
checking. The technique generates tests automatically, runs the tests capturing logical constraints
on inputs, and minimizes the conditions on the inputs to failing tests, so that the resulting bug
reports are small and useful.

Contribution

1. The technique utilizes both combined concrete and symbolic execution and explicit-state
model checking.
2. We adapt the established technique of dynamic test generation, based on combined concrete
and symbolic execution.
3. We created a tool, Apollo.
4. We evaluated our tool by applying it to 6 real web applications.
5. We present a detailed classification of the faults found by Apollo

Evaluation

The Evaluation Methods are almost the same with the previous work [AKD+08].



[AKD+08] Finding bugs in dynamic web applications

Artzi, S., Kiezun, A., Dolby, J., Tip, F., Dig, D., Paradkar, A., and Ernst, M. D. 2008. Finding bugs
in dynamic web applications. In Proceedings of the 2008 international Symposium on Software
Testing and Analysis (Seattle, WA, USA, July 20 - 24, 2008). ISSTA '08. ACM, New York, NY,
261-272.

Annotation

A framework of test generation for web application is proposed in this paper. The technique is
based on combined concrete and symbolic execution. The authors also present the failure
detection algorithm and the path constraint minimization algorithm.

Keyword

symbolic execution; dynamic analysis; test generation

Abstract

Background

Web script crashes and malformed dynamically-generated Web pages are common errors, and they
seriously impact usability of Web applications.

Motivation

Current tools for Web-page validation cannot handle the dynamically-generated pages that are
ubiquitous on today’s Internet.

Solution

In this work, we apply a dynamic test generation technique, based on combined concrete and
symbolic execution, to the domain of dynamic Web applications. The technique generates tests
automatically, uses the tests to detect failures, and minimizes the conditions on the inputs exposing
each failure, so that the resulting bug reports are small and useful in finding and fixing the
underlying faults. Our tool Apollo implements the technique for PHP. Apollo generates test inputs
for the Web application, monitors the application for crashes, and validates that the output
conforms to the HTML specification.

Contribution

1. We adapt the established technique of dynamic test generation, based on combined concrete
and symbolic execution, to the domain of Web applications.
2. We created a tool, Apollo.
3. We evaluated our tool by applying it to real Web applications and comparing the results with
random testing.

Evaluation

The author designed the experiments to answer the following research questions:
1. How many faults can Apollo find, and of what varieties?
2. How effective is the fault localization technique of Apollo compared to alternative
approaches such as randomized testing, in terms of the number and severity of discovered
faults and the line coverage achieved?
3. How effective is our minimization in reducing the size of inputs parameter constraints and

failure-inducing inputs?
For the evaluation, the author selected the following four open-source PHP programs: faqforge,
webchess, schoolmate, phpsysinfo.



Test Execution

Test Optimization

[DGM10] On test repair using symbolic execution

Daniel, B., Gvero, T., and Marinov, D. 2010. On test repair using symbolic execution. In
Proceedings of the 19th international Symposium on Software Testing and Analysis (Trento, Italy,
July 12 - 16, 2010). ISSTA '10. ACM, New York, NY, 207-218.

Annotation

When the program is changed, the test code is out of data, which may cause regression tests failed.
The paper proposes a technique based on symbolic execution to repair the test. The author
analyzed the .NET code’s symbolic execution by using a tool named Pex. This paper is
[DJDM09]’s enhanced solutions. It fixed several failures that ReAssert could not repair or that it
could have repaired in a better way. The author describe modifications on expected values,
expected object comparison and conditional expected value as examples.

Keyword

test repair; symbolic execution

Abstract

Background

When developers change a program, regression tests can fail not only due to faults in the program
but also due to out of date test code that does not reflect the desired behavior of the program.

Motivation

Repairing tests manually is difficult and time consuming.

Solution

We recently developed ReAssert, a tool that can automatically repair broken unit tests, but only if
they lack complex control flow or operations on expected values.

Contribution

This paper introduces symbolic test repair, a technique based on symbolic execution, which can
overcome some of ReAssert’s limitations.

Evaluation

We reproduce experiments from earlier work and find that symbolic test repair improves upon
previously reported results both quantitatively and qualitatively. We also perform new experiments
which confirm the benefits of symbolic test repair and also show surprising similarities in test
failures for open-source Java and .NET programs. Our experiments use Pex, a powerful symbolic
execution engine for .NET, and we find that Pex provides over half of the repairs possible from
the theoretically ideal symbolic test repair.
Q1: How many failures can be repaired by replacing literals in test code? That is, if we had an
ideal way to discover literals, how many broken tests could we repair?
Q2: How do literal replacement and ReAssert compare? How would an ideal literal replacement
strategy affect ReAssert’s ability to repair broken tests?

Q3: How well can existing symbolic execution discover appropriate literals? Can symbolic
execution produce literals that would cause a test to pass?
Java: Checkstyle, JDepend, JFreeChart, Lucene, PMD, XStream
.NET: AdblockIE, CSHgCmd, Fudge-CSharp, GCalExchangeSync, Json.NET, MarkdownSharp,
NerdDinner, NGChart, NHaml, ProjectPilot and SharpMap.



[HO09] MINTS: A general framework and tool for supporting

test-suite minimization

Hwa-You Hsu; Orso, A.; , "MINTS: A general framework and tool for supporting test-suite
minimization," Software Engineering, 2009. ICSE 2009. IEEE 31st International Conference on ,
vol., no., pp.419-429, 16-24 May 2009

Annotation

This is first published paper, which attempted to handle multi-criteria test-suite minimization
problems. This approach models multi-criteria minimization as binary ILP problems and then
leverages ILP solvers to compute an optimal solution to such problems.
Note that the difference and relation between minimization criteria and minimization policies.

Keyword

test-suite minimization

Abstract

Background

Test-suite minimization techniques aim to eliminate redundant test cases from a test-suite based on
some criteria, such as coverage or fault-detection capability.

Motivation

Most existing test-suite minimization techniques have two main limitations: they perform
minimization based on a single criterion and produce suboptimal solutions.

Solution

In this paper, we propose a test-suite minimization framework that overcomes these limitations by
allowing testers to (1) easily encode a wide spectrum of test-suite minimization problems, (2)
handle problems that involve any number of criteria, and (3) compute optimal solutions by
leveraging modern integer linear programming solvers.

Contribution

1. A general test-suite minimization framework that handles minimization problems involving
any number of criteria and can produce optimal solutions to such problems.
2. A prototype tool that implements the framework, can interface seamlessly with a number of
different ILP solvers, and is freely available.
3. An empirical study in which we evaluate the approach using a wide range of programs, test
cases, minimization problems, and solvers.

Evaluation

In the evaluation, the authors investigated the following research questions:
1. How often can MINTS find an optimal solution for a test-suite minimization problem in a
reasonable time?
2. How does the performance of MINTS compare with the performance of a heuristic
approach?
3. To what extent does the use of a specific solver affect the performance of the approach?
Note that the authors consider one absolute minimization criterion and three relative minimization
criteria. The authors also consider eight different minimization policies: seven weighted and one
prioritized.
The benchmark is the Siemens suite and three additional programs with real faults: flex,
LogicBlox, and Eclipse.



[WHLM95] Effect of test set minimization on fault detection

effectiveness

Wong, W. E., Horgan, J. R., London, S., and Mathur, A. P. 1995. Effect of test set minimization on
fault detection effectiveness. In Proceedings of the 17th international Conference on Software
Engineering (Seattle, Washington, United States, April 24 - 28, 1995). ICSE '95. ACM, New York,
NY, 41-50.

Annotation

Keyword

Abstract

Background

Size and code coverage are important attributes of a set of tests.

Motivation

A program P is executed on elements of the test set T. Can we observe the fault detecting
capability of T for P? Which T induces code coverage on P according to some coverage criterion?
Whether is it the size of T or the coverage of T on P which determines the fault detection
effectiveness of T for P?
While keeping coverage constant, what is the effect on fault detection of reducing the size of a test
set?

Solution

We report results from an empirical study using the block and all-uses criteria as the coverage
measures.

Contribution

Evaluation


Method/Means | Evaluation | Technique | Analytic Model Analysis | Persuasion |
Characterization Experience

Test Adequacy Criterion

Mutant Testing

[LJT+10] Is operator-based mutant selection superior to

random mutant selection?

Zhang, L., Hou, S., Hu, J., Xie, T., and Mei, H. 2010. Is operator-based mutant selection superior
to random mutant selection?. In Proceedings of the 32nd ACM/IEEE international Conference on
Software Engineering - Volume 1 (Cape Town, South Africa, May 01 - 08, 2010). ICSE '10. ACM,
New York, NY, 435-444.

Annotation

Mutant selection is used for reduce the expensiveness of compiling and executing too many
mutants. Many researches on mutant selection are operator- based. In this paper, the issue whether
operator- based mutant selection really superior than ones using random methods is addressed. By
empirical study on three operator- based mutant selection techniques (i.e., Offutt et al.’s 5
mutation operators [31], Barbosa et al.’s 10 mutation operators [4], and Siami Namin et al.’s 28
mutation operators [37]) and two random ones, the research indicates that operator-based mutant
selection is not superior.

Keyword

Abstract

Background

Due to the expensiveness of compiling and executing a large number of mutants, it is usually
necessary to select a subset of mutants to substitute the whole set of generated mutants in mutation
testing and analysis. Most existing research on mutant selection focused on operator-based mutant
selection, i.e., determining a set of sufficient mutation operators and selecting mutants generated
with only this set of mutation operators. Recently, researchers began to leverage statistical analysis
to determine sufficient mutation operators using execution information of mutants.

Motivation

However, whether mutants selected with these sophisticated techniques are superior to randomly
selected mutants remains an open question.

Solution

In this paper, we empirically investigate this open question by comparing three representative
operator-based mutant-selection techniques with two random techniques. Our empirical results
show that operator-based mutant selection is not superior to random mutant selection. These
results also indicate that random mutant selection can be a better choice and mutant selection on
the basis of individual mutants is worthy of further investigation.

Contribution

Our study empirically evaluates three recent operatorbased mutant-selection techniques (i.e.,
Offutt et al. [31], Barbosa et al. [4], and Siami Namin et al. [37]) against random mutant selection
for mutation testing.
Our study produces the first empirical results concerning stability of operator-based mutant
selection and random mutant selection for mutation testing.
Beside the random technique studied previously (referred to as the one-round random technique in
this paper), our study also investigates another random technique involving two steps to select
each mutant (referred to as the two-round random technique in this paper).

The subjects used in our study are larger than those used in previous studies of random mutant
selection. To the best of our knowledge, due to the extreme expensiveness of experimenting
mutant-selection techniques, the Siemens programs are by far the largest subjects2 used in studies
of mutant selection [37].

Evaluation


Characterization Analytic Model Analysis

[ST10] From behaviour preservation to behaviour

modification: constraint-based mutant generation

Annotation

This paper presents a mutant generation approach, which generate both syntactically semantically
correct mutants. The author builds this approach with several constraint-based methods. From
Accessibility Constraints, Introducing or Deleting Entities, Type Constraints, the author not only
generates mutants, but also rejects mutants. The author also applied this technique to several open
source programs, such as JUnit, JHotDraw, Draw2D, Jaxen and HTMLParser.

Keyword

Mutation Analysis

Abstract

Background

This paper is about mutation generation. The authors’ approach builds on their prior work on

constraint-based refactoring tools, and works by negating behaviour-preserving constraints.

Motivation

The efficacy of mutation analysis depends heavily on its capability to mutate programs in such a
way that they remain executable and exhibit deviating behaviour. Whereas the former requires
knowledge about the syntax and static semantics of the programming language, the latter requires
some least understanding of its dynamic semantics, i.e., how expressions are evaluated.

Solution

We present an approach that is knowledgeable enough to generate only mutants that are both
syntactically and semantically correct and likely exhibit non-equivalent behaviour.

Evaluation

As a proof of concept we present an enhanced implementation of the Access Modifier Change
operator for Java programs whose naive implementations create huge numbers of mutants that do
not compile or leave behaviour unaltered. While we cannot guarantee that our generated mutants
are non-equivalent, we can demonstrate a considerable reduction in the number of vain mutant
generations, leading to substantial temporal savings.


Method/Means Technique Analysis, Persuasion

[JH09] An analysis and survey of the development of mutation

testing

Yue Jia, Mark Harman (September 2009). "An Analysis and Survey of the Development of
Mutation Testing" (PDF). CREST Centre, King's College London, Technical Report TR-09-06.

Annotation

An overview of Mutation Testing is given in the paper. Not only the basic notions, the author also
introduces the history and the application of Mutation Testing at first. In the second part,
fundamental hypotheses, the process, the problems in theoretical research is discussed. In the third
part, Techniques in Mutation Testing is classified into two types, reduction of the generated
mutants (which combines do fewer and do faster) and reduction of the execution cost (which
corresponds to do faster). To detect if a program and one of its mutants programs are equivalent is
an known undecidable problem. The problem is discussed in Part 4. In the fifth part, the author
classified the applications of mutation testing into program mutation and specification mutation.
More detailed statistics is shown. In Part 6 and Part 7, empirical evaluation and Tools using
mutation testing is gathered and listed. In the last part, the author shows five important avenues for
research: a need for high quality higher order mutants, a need to reduce the equivalent mutant
problem, a preference for semantics over syntax, an interest in achieving a better balance between
cost and value and a pressing need to generate test cases to kill mutants.

Keyword

mutation testing

Abstract

Outline

1. Introduction
2. The theory of mutation testing
3. Cost reduction techniques
4. Equivalent mutant detective techniques
5. The application of mutation testing
6. Empirical evaluation
7. Tools support mutation testing
8. Future Trend
9. Conclusion

Background

Mutation Testing is a fault–based software testing technique that has been widely studied for over
three decades.

Motivation

The literature on Mutation Testing has contributed a set of approaches, tools, developments and
empirical results which have not been surveyed in detail until now.

Solution

This paper provides a comprehensive analysis and survey of Mutation Testing. The paper also
presents the results of several development trend analyses.

Evaluation

These analyses provide evidence that Mutation Testing techniques and tools are reaching a state of
maturity and applicability, while the topic of Mutation Testing itself is the subject of increasing
interest.

32 High Well 2010.09.27

Characterization Analytic Model Analysis, Experience

High-Dimensional Clustering

[HK99] Optimal Grid-Clustering: Towards Breaking the Curse

of Dimensionality in High-Dimensional Clustering

Alexander Hinneburg , Daniel A. Keim, Optimal Grid-Clustering: Towards Breaking the Curse of
Dimensionality in High-Dimensional Clustering, Proceedings of the 25th International Conference
on Very Large Data Bases, p.506-517, September 07-10, 1999

Annotation

Keyword

High-Dimensional Clustering

Abstract

Background

Many applications require the clustering of large amounts of high-dimensional data. In addition,
the high-dimensional data often contains a significant amount of noise which causes additional
effectiveness problems.

Motivation

The comparison reveals that condensation-based approaches (such as BIRCH or STING) are the
most promising candidates for achieving the necessary efficiency, but it also shows that basically
all condension-based approaches have severe weaknesses with respect to their effectiveness in
high-dimensional space.

Solution

To overcome these problems, we develop a new clustering technique called OptiGrid which is
based on constructing an optimal grid-partitioning of the data. The optimal grid-partitioning is
determined by calculating the best partitioning hyperplanes for each dimension (if such a
partitioning exists) using certain projections of the data.

Evaluation

We perform a series of experiments on a number of different data sets from CAD and molecular
biology. A comparison with one of the best known algorithms (BIRCH) shows the superiority of
our new approach.

12 High Bad 2010.10.23

Method/Means, Evaluation Technique Analysis

[KKZ09] Clustering high-dimensional data: A survey on

subspace clustering, pattern-based clustering, and

correlation clustering.

Kriegel, H., Kröger, P., and Zimek, A. 2009. Clustering high-dimensional data: A survey on
subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl.
Discov. Data 3, 1 (Mar. 2009), 1-58.

Annotation

Keyword

Abstract

Outline

INTRODUCTION
a) Sample Applications of Clustering High-Dimensional Data
i. Gene Expression Analysis.
ii. Metabolic Screening.
iii. Customer Recommendation Systems.
iv. Text Documents.

b) Finding Clusters in High-Dimensional Data
i. The main challenge for clustering here is that different subsets of features are
relevant for different clusters, that is, the objects cluster in subspaces of the data
space but the subspaces of the clusters may vary.
ii. A common way to overcome problems of high-dimensional data spaces where
several features are correlated or only some features are relevant is to perform
feature selection before performing any other data mining task.
iii. Unfortunately, such feature selection or dimensionality reduction techniques cannot
be applied to clustering problems.
iv. Instead of a global approach to feature selection, a local approach accounting for
the local feature relevance and/or local feature correlation problems is required.

Background

As a prolific research area in data mining, subspace clustering and related problems induced a vast
quantity of proposed solutions.

Motivation

However, many publications compare a new proposition—if at all—with one or two competitors,
or even with a so-called “naïve” ad hoc solution, but fail to clarify the exact problem definition.
As a consequence, even if two solutions are thoroughly compared experimentally, it will often
remain unclear whether both solutions tackle the same problem or, if they do, whether they agree
in certain tacit assumptions and how such assumptions may influence the outcome of an
algorithm.

Solution

In this survey, we try to clarify: (i) the different problem definitions related to subspace clustering
in general; (ii) the specific difficulties encountered in this field of research; (iii) the varying
assumptions, heuristics, and intuitions forming the basis of different approaches; and (iv) how
several prominent solutions tackle different problems.

Evaluation


Testing survey by_directions

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Similar to Testing survey by_directions

Similar to Testing survey by_directions (20)

More from Tao He

More from Tao He (16)

Recently uploaded

Recently uploaded (20)

Testing survey by_directions