Presentation at the Postdoctoral symposium of the 2011 International Conference on Software Maintenance, accompanying the paper
http://soft.vub.ac.be/Publications/2011/vub-soft-tr-11-11.pdf
A Logic Meta-Programming Foundation for Example-Driven Pattern Detection in Object-Oriented Programs
1. A Logic Meta-Programming Foundation for
Example-Driven Pattern Detection
in Object-Oriented Programs
Post-doctoral Symposium Track
International Conference on Software Maintenance, 29/09/2011
Colonial Williamsburg, VA (USA)
Coen De Roover Promotors:
Software Languages Lab Wolfgang De Meuter
Vrije Universiteit Brussel Johan Brichau
2. General-Purpose Pattern Detection Tools
identify code of which the user specified the characteristics
e.g. structural e.g. control flow, data flow
class Ouch { scanner = new Scanner();
int hashCode() { ...
return ...; scanner.close();
} ...
} scanner.next();
2
Let me explain the title first. Given its length, this will take a slide or two. First off, what do I consider a “general-purpose pattern detection tool”? Well, that’s any tool
that identifies code of which the user has specified the characteristics. Those characteristics can be related to the structure of a program, but also its control flow and
data flow. The slide illustrates the difference. Consider possible violations of the invariant that equal objects have equal hash codes. Those are characterized
structurally by a method hashCode(), without a corresponding equals() method. Reads from a closed scanner, on the other hand, are characterized by control flow
characteristics (next is invoked after close), and data flow characteristics (both on same scanner). Now, if such a tool existed, it could be used for a variety of
purposes. For instance, to check whether the protocol of an in-house API is used correctly. Or to check whether an application-specific bug you just discovered, isn’t
more widespread. Or even to check whether someone is instantiating a class for which a factory method exists. In short, it could be used to detect a lot of interesting
application-specific patterns ... not just design patterns :)
3. Example-driven Detection user class ?name extends Component {
public void acceptVisitor(?type ?v) {
System.out.println(?string);
?v.?visitMethod(this);
}
}
tool public class OnlyLoggingLeaf extends Component {
public void acceptVisitor(ComponentVisitor v) {
System.out.println("Only logging.");
}
}
public class SillyLeaf extends OnlyLoggingLeaf {
public void acceptVisitor(ComponentVisitor v) {
super.acceptVisitor(v);
ComponentVisitor temp = v;
temp.visitSuperLogLeaf(this);
}
} 0.648
3
Of course, if such a general-purpose pattern detection tool existed, you would still have to tell it somehow what patterns to look for. Wouldn’t it be great if you could
just give the tool an example implementation of the pattern you are looking for, and have the tool return all variants of this example in your code? In a nutshell, that’s
what I proposed in my dissertation: an example-driven approach to pattern detection. So, how does it work in practice? Imagine that you want to check whether all
subclasses of a Component class, have a method acceptVisitor that logs something before double dispatching to its parameter. Then you give the tool an example
implementation as shown on the slid. It looks like regular Java code with meta-variables. For instance, the one in green substitutes both for the parameter of the
method and the receiver of the double dispatching. Each result reported by the tool consists of bindings for the meta-variables. One of those results is shown on the
bottom of the slide. As you can see, this particular result is a variant of the implementation we gave to the tool. Here, the acceptVisitor() method performs the
required logging through a super call instead of directly. It also doesn’t dispatch to its parameter, but to a temporary variable that aliases the parameter. The more of
these variants the tool finds, the better. However, not all variants of the implementation are equal. That’s why an example-driven pattern detection tool ranks the
variants it finds based on their similarity to the given example.
4. Motivation
uniform language for specifying behavioral and structural characteristics
existing specification languages are too specialized
e.g. temporal logic formulas over a control flow graph
constraint satisfaction problem over AST nodes
familiar to developers
often communicate using example snippets and diagrams
recalls implicit implementation variants through static analyses
relieves developers from having to enumerate each variant
shields developers from intricate analysis results
facilitates assessing reported variants
e.g. present in one, all, or some program executions
4
So, why did I investigate such an example-driven approach to pattern detection? First of all, code templates provide a uniform language for specifying the behavioral
and structural characteristics of a pattern. Existing tools, in contrast, are tailored to one kind of characteristic. For instance, tools specialized in cflow chars might
have you express cflow chars using temporal logic formulas over a cflow graph. Tools specialized in structural characteristics, on the other hand, might have you
express a constraint satisfaction problem over ast nodes. Clearly, the differences between such highly specialized languages make it difficult to specify
heterogeneously characterized patterns. Second, code templates align well with the way developers tend to communicate: through example snippets and diagrams.
Third, having the tool recall implicit variants of the exemplified characteristics relieves user from having to enumerate all of them in a specification. It also shields
users from the intricate program analyses that are needed to recall these variants. Fourth, not all variants of behavioral characteristics are created equal. Some show
up in only one, in some, or in all program executions. Ranking these variants should facilitate assessing them.
5. In the dissertation ...
5
Now, motivating all of this took a lot longer in the actual dissertation. There, I discussed the dimensions in the design of a pattern detection tool, surveyed the
existing tools on these dimensions, concluded that there was a need for a general-purpose tool, motivated a set of desiderata for such a tool, and evaluated the
existing tools on these desiderata. I then used this evaluation to motivate the cornerstones of my approach. Today, I’ll briefly discuss 3 of them: logic meta-
programming, example-driven matching of code templates and domain-specific unification.
6. Founding Cornerstone: LMP
specify characteristics through logic queries
leave operational search to logic evaluator
quantify over
reified program
representation
AST, CFG, PTA
✓ expressive, declarative, ...
✘ exposes users to details of
representation + reification
6
Logic Meta Programming is the founding cornerstone of my approach. It advocates specifying a pattern’s characteristics through logic queries, and leaving the
operational search for the pattern’s instances to the logic evaluator. Which is, of course, a good thing. You can read the query on the slide as “give me a class that
declares a method named “foo” or one that is named “bar”. LMP is something that has already been around for a decade or two. It has been used to detect patterns
in ASTs, CFGs, and even PTA results. It is very expressive, but it exposes users to the details of such a representation and the way it was converted into a logic
format.
7. Cornerstone: Example-Driven Specification
exemplify characteristics through code templates embedded in logic queries
if jtClassDeclaration(?class) {
},
?method methodDeclarationHasName: ?visitMethod
matched according to multiple strategies
vary in leniency from AST-based to flow-based
recall implicit variants of structural and control flow characteristics
{ super.acceptVisitor(v); ✓ { System.out.println(“Hello”); ✘
x.doSomething(); ComponentVisitor temp = v;
v.visitSuperLogLeaf(this); } temp.visitSuperLogLeaf(this); }
7
So, LMP is expressive, but difficult to use. The second cornerstone of my approach therefore advocates exemplifying pattern characteristics through code templates
instead. By embedding templates in logic queries, they can be combined through logic connectives and multiple occurrences of the same meta-variable. On the
slide, you can see a logic query that consists of two conditions. The first condition corresponds to our example implementation of the Component subclass. The
second condition is not a template, but an ordinary logic condition. They are connected through the occurrences of the purple meta-variable. As a result, we will also
find the method invoked by the visitXXX message. Now, code templates are nothing new in pattern detection tools. What is new, however, is that we match these in
an example-driven manner according to multiple strategies. These strategies vary in leniency from strict AST-based (which is the predominant one among pattern
detection tools) to a very lenient flow-based matching. The idea is that these strategies recall variants of sturctural and control flow chars that are implied by the
semantics of the programming language. For instance, even an indirect subclass of Component with the acceptVisitor method on the left would be recognized as a
variant of our implementation. It corresponds to the example, except that it contains additional instructions and performs the exemplified logging through a super
call. However, to recall the pattern instance on the right, our tool needs to be able to recognize implicit variants of data flow characteristics. This brings us to the last
cornerstone I will discuss.
8. Cornerstone: Domain-Specific Unification
extensions ensure that implicit implementation variants unify
class MustAlias extends Component {
class ?name extends Component {
public void acceptVisitor(ComponentVisitor v) {
public void acceptVisitor(?type ?v) {
System.out.println(“Hello”);
System.out.println(?string);
ComponentVisitor temp = v;
?v.?visitMethod(this);
temp.visitSuperLogLeaf(this);
}
}
}
}
consults static analyses
AST node AST node identical 1 likelihood of
resulting in
false positive,
Qualified Type Simple Type denote same or co-variant return types 1 propagated by
fuzzy logic
cornerstone
Expression Expression in must-alias or may-alias relation 0.9 or 0.5
Message Name Method Name message may invoke receiver type
to dynamic or static
method according 0.5 or 0.4
...
8
The domain-specific unification cornerstone consists of domain-specific extensions to the regular unification procedure we know from Prolog. It ensures that implicit
implementation variants unify. In the code on the right, the first occurrence of the green variable v is bound to a parameter. The second occurrence of v is bound to
a temporary variable. The dsu allows this because v and temp happen to evaluate to the same value at run-time. To determine this, the dsu consults static
analyses.The table on the slide lists some other unification extensions. The one in the second row, unifies a qualified type with a simple type if both denote the same
type or are co-variant return types. To this end, it consults a semantic analysis. The name of a message and the name of a method also unify if the message may
invoke the method according to the static or the dynamic type of the receiver.
As an extension may succeed where the plain uni proc fails, it might result in false positives. We therefore associate unification degrees with each extension. They
are shown in the last column. For instance, two expression unify with a degree of 0.9 if they alias in every program execution, but with a degree of 0.5 if they alias
only in some. All of these degrees are combined and used to compute the ranking of a detected result.
9. In Practice:
Detecting Lapsed Observers
9
On to some practice. The paper discusses how to detect possible lapsed observers in an example-driven manner.
Those are observers that are added to a subject, but never removed.
10. tion Name
Example-driven Specification 1 class
2 pr
1 if jtClassDeclaration(?subjectClass){
subject 2 class ?subjectName {
3 pu
class 3 ?mod1List ?t1 ?observers = ?init; 4
5 }
4 public ?t2 ?addObserver ( ?observerType ?observer ) { add
6 pu
5 ?observers .?add( ?observer ); method
7
6 }
8 }
7 public ?t3 ?removeObserver( ?observerType ?otherObserver) {
9 pu
8 ?observers .?remove(?otherObserver); 10
9 }
11
10 ?mod2List ?t4 ?notifyObservers(?param1List) {
12
11 ?observers ;
13 }
12 ?observer . ?update (?argList); update
14 }
13 } message
15 class
14 } 16 publ
15 }, 17 }
18 class
16 jtClassDeclaration(?observerClass){
observer 17 class ?observerName {
19
20
publ
Po
class 18 ?mod3List ?t5 ?update (?argList) {} update
Sc
method
21
19 }
20 }, 22 Sc
23 p.
lapsed 21 jtExpression(?register){ ?subject. ?addObserver ( ?lapsed ) },
add
24
message
p.
..
observer 22 not(jtExpression(?unregister){ ?subject.?removeObserver( ?lapsed ) }),
25
26 p.
instance 23 jtExpression(?alloc){ ?lapsed := new ?observerName (?argList) } instance }
27
creation
28 }
Rest assured, there is some reason to this madness. I’ve simply highlighted all occurrences of a variable in the same color. The specification for the lapsed observer
Fig. 2. Domain-specific extensions of the unification procedure illustrated on the d
consists of 3 parts. The first is a template that exemplifies the prototypical implementation of the subject class. Among others, it has a method addObserver (in
orange) for registering an observer with the subject. It takes the observer as its parameter (in blue) and adds the observer to the purple field. It also has a
notifyObservers method that notifies observers of state changes. Note that it sends a message (in yellow) to one of the previously added observers in blue). The
second part exemplifies an observer class. It is exemplified as a class in which the method invoked by the yellow update message resides. The lapsed observer
instance is found as the gray argument to an addObserver invocation, which is never used as an argument to any removeObserver invocation. The last line finds the
expression that created this observer instance.
bjectClass and ?observerClass). Lines 21–23 exemplify using th
lapsed listener pitfall at the instance-level: as instances approach
11. 9 1 tantiates declared
l must-alias analysis or according to an inter-procedural may- 10 or 2 Name expression Name
ate to the same object at run-time
Example Instance
an Expression an Expression according to an intra-procedura
9 alias analysis, must or may evalu
e according to a semantic analysis 10 an Expression a variable declara- expression references the variab
tion Name
1 class Point implements ChangeSubject { 1 if jtClassDeclaration(?subjectClass){
2 private HashSet observers ; 2 class ?subjectName {
3 ?mod1List ?t1 ?observers = ?init;
3 public void addObserver ( utils.ChangeObserver o) { 4 public ?t2 ?addObserver ( ?observerType ?observer ) {
?observers .?add( ?observer );
4 observers .add( o ); 5
6 }
5 } 7 public ?t3 ?removeObserver( ?observerType ?otherObserver) {
6 public void removeObserver( ChangeObserver o) { 8 ?observers .?remove(?otherObserver); 1
9 }
7 this.observers .remove(o); 10 ?mod2List ?t4 ?notifyObservers(?param1List) { 1
?observers ; 1
8 } 11
1
12 ?observer . ?update (?argList);
9 public void notifyObservers() { 13 }
1
1
0 for (Iterator e = observers .iterator() ; e.hasNext() ;) {
14 }
1
15 },
1
1 ((ChangeObserver)e.next()) . refresh (this); 1
16 jtClassDeclaration(?observerClass){
2 } 17 class ?observerName {
1
2
3 } 18 ?mod3List ?t5 ?update (?argList) {}
2
4 } 19 }
2
20 },
5 class Screen implements ChangeObserver { 2
21 jtExpression(?register){ ?subject. ?addObserver ( ?lapsed ) }, 2
6 public void refresh (ChangeSubject s) { ... } 2
22 not(jtExpression(?unregister){ ?subject.?removeObserver( ?lapsed ) }),
7 } 2
23 jtExpression(?alloc){ ?lapsed := new ?observerName (?argList) } 2
8 class Main { 2
9 public static void main(String[] args) {
0 Point p = new Point(5, 5); Fig. 2. Domain-specific extensions of the unification procedure illustrated
1 Screen s1 = new Screen ("s1") ; qualified type simple type
2 Screen s2 = new Screen("s2"); ?subjectClass and ?observerClass). Lines 21–23 exemplify us
3
4
p. addObserver ( s1 );
p.addObserver(s2);
the lapsed listener pitfall at expression the instance-level: expression
as instances ap
5 ... of the participating classes that exhibit the characteristics of
expression parameter name th
6 p.removeObserver(s2); the pitfall. They identify ?lapsed objects that are added to a pu
}
?subject (line 21), but never removed from it message name
method name
7
8 } (line 22). The
final condition term is optional. It identifies the expression
class name simple type an
that instantiated the lapsed object. To this end, it uses the 11 div
on the detection of lapsed listeners in the Observer design pattern. unifies the logic
non-native operator := which variable on ar
Here’s an example of a lapsed listener, together with the unification extensions that were required to find it. The paper has all the
details. its left-hand side with the AST node that matches the code ter
on
ing the desiderata it helps to fulfill.newits right-hand side. the depicted program. be bound to
Screen("s1") for
As a
Next, we evaluatedresult, ?alloc will our or
as
pproach as a whole on these desiderata bythat the depicted specification only detects possible
Note detecting patterns of
lapsed
at are representative for the intended listeners.of doesan observer theno longerthe program’s
execution after
It
use which not identify is point in needed,
a general- the
nor pr
urpose tool: design patterns, µ-patterns it specify that the ?unregister expression should be
does and bug patterns. de
12. Evaluation
result in general-purpose detection tool
for structural and behavioral characteristics
using descriptive specifications in a uniform language
motivated each cornerstone through desiderata it helps to fulfill
using running examples for each kind of characteristic
approach as a whole on desiderata
using design patterns, µ-patterns, bug patterns
✓ descriptive specifications
✓ most instances recalled with few false positives
✘ cardinality constraints difficult to exemplify
12
Now, how do you evaluate such a thing? I needed to evaluate the approach as a whole on the desiderata for a general-purpose pattern detection tool, but I also had
to motivate its individual cornerstones. I therefore enabled the cornerstones one by one in my tool to demonstrate what desiderata they help to fulfill. The approach
as a whole was evaluated by detecting instances of representative design patterns, micro-patterns and bug patterns. And of course, it worked well. A lot of
specifications consisted solely of Java code with meta-variables. Only cardinality constraints such as “at least as many as” are easier to express using plain LMP.
13. Future Work
what to rank pattern detection results on?
similarity + imprecisions in analyses severity for bug patterns?
need a corpus of programs in which pattern instances have been documented
pattern specification formalisms that are even easier to use
generalizing pattern instances into example-driven specifications
search space exploration backed by program analyses
in general, make sure that our tools become part of every developer’s toolbox
e.g. example-driven program transformation
e.g. example-driven history querying
but ... maybe ease-of-use is not the only adoption hurdle
13
What do I consider future work? First of all, I’m currently ranking results based on their similarity to the given example and on the imprecisions in the analyses that
were needed to find each result. That seems ok for several patterns, perhaps except for bug patterns. Specialized bug detection tools rank bugs based on their
severity. So there is room for future work here, although yesterday’s keynote speaker seems to disagree :) In any case, to evaluate our tools, we need a corpus of
programs in which pattern instances have been documented. That’s a tremendous task, but someone has to do it. Perhaps we can do it collaboratively using a social
website. There is also room for other specification formalisms that are even easier to use. Currently, I’m interested in search-based techniques from artificial
intelligence to automatically generalize snippets of code into an example-driven specification. In general, I believe a lot of work still needs to be done to ensure our
tools become part of every developer’s toolbox. I’m thinking of specifying program transformations in an example-driven way or even querying the history of a
program in an exemple-driven way. But maybe ease-of-use is not the only hurdle to the adoption of our tools. Empirical studies are needed to determine what is
keeping our tools from the toolboxes.
14. g hly
hi
ub jec tive Lessons for Doctoral Students
s
proponent of artifact-driven research
share with others, gain momentum
specification of artifact for reproducibility (in my case: meta-interpreters)
stand on shoulder of giants ... or reinvent the wheel ?
SOUL, JDT, SOOT: thanks!
but, often takes implementing an algorithm to understand its details
be wary of analysis paralysis
trust your advisors when they say you have enough material ;)
anonymous: “getting a PhD is akin to getting a driver’s license for doing research”
14
Since this is the post-doctoral symposium, here are some of the lessons I learned that could be of use to doctoral students. Warning: these are highly subjective and
personal.