[2024]Digital Global Overview Report 2024 Meltwater.pdf
An Algorithm for Keyword Search on an Execution Path
1. An Algorithm
for Keyword Search
on an Execution Path
Toshihiro Kamiya
Future University Hakodate
kamiya@fun.ac.jp
2. Background #1: Code searching
Developers do search!
➤ To find reusable components for a function of a product
➤ To find similar code fragments before modifying a code
➤ To find code samples showing usage a given class or
component
CSMR-WCRE-2014 Era Track
2
3. Background #2: Emerging
fine-grained module technologies
More and more fine-grained modules are used.
●
Object/Closure
extract a data and its manipulation
●
Aspect
extract interests, a set of code invoked by a specific
condition or event
●
Dependency Injection
split code at each dependency
CSMR-WCRE-2014 Era Track
3
4. Problem: Searching on fine-grained
modules
Code search becomes difficult by
fine-grained modules
(Old days) the search result was
contained in a file
↓
(Now) is a set of several parts of several
files
Old days
This affects code-search methods in both
●
Algorithm
–
●
Now
“how to find”
Displaying/Visualizing
–
“how to show search results”
CSMR-WCRE-2014 Era Track
4
5. Solution: Keyword Search on an
Execution Path
●
●
Static analysis
Find the execution paths that include given keywords
●
●
●
From all possible execution paths of a target program
Idea: a compact data structure (And/Or/Call graph) of
execution paths + search algorithm on it
A prototype implementation
●
applied to up-to 183k lines of Java source code
Related work
●
●
Prospector[8]
PARSEWeb[9]
CSMR-WCRE-2014 Era Track
5
6. And/Or/Call Graph
●
●
A DAG contains all execution
paths in a compact form
Source code
Repetitive structure
➡ Selection among sequences
of 0-time repetition, 1-time
repetition,2-times repetition, ...
➡ Or node having And nodes as
children
s3
Selection structure ➡ Or node
–
s2
Sequence structure ➡ And node
–
–
Method call ➡ Call node
●
Tex
s1
s1;
s2;
s3;
is generated by the following
translation rules
–
Graphical form
if (...) {
st;
} else {
se;
}
st
se
interface I { m(); }
class
m()
}
class
m()
}
B implements I {
{...}
C implements I {
{...}
I i;
...
i.m();
B//m
C//m
Dynamic dispatching
CSMR-WCRE-2014 Era Track
6
10. Search Algorithm
●
●
Input: Keywords to identify nodes
Output: Connected sub-graphs including the
nodes identified with the keywords
“connected sub-graph” → continuous execution path
●
Heuristics
–
Find deepest nodes
← Assumption: small operation is easy to understand
–
Extract shallowest sub-graph(treecut)
← Assumption: deep method-invocation chain is difficult to
understand
CSMR-WCRE-2014 Era Track
10
11. Label and Summary
Label/Summary are “index” data of
search algorithm.
●
Label
–
–
●
Calendar//getIntance
A set of names put on a node
Keywords in a query
split
Summary
–
getDay
A node n’s summary S(n) is a set of
names of (child and) descendant
main
nodes of n.
Properties
–
–
For any node n and its any child node c
S(n) ⊇ S(c).
A root node has a summary of local
maximum.
parseInt
parseInt
parseInt
Calendar//set
getToday
getDayOfWeek
Calendar//getIntance
Calender//get
printf
CSMR-WCRE-2014 Era Track
11
12. Label and Summary
Label/Summary are “index” data of
search algorithm.
●
Label
–
–
●
Calendar//getIntance
A set of names put on a node
Keywords in a query
split
Summary
–
getDay
A node n’s summary S(n) is a set of
names of (child and) descendant
main
nodes of n.
Properties
–
–
For any node n and its any child node c
S(n) ⊇ S(c).
A root node has a summary of local
maximum.
parseInt
parseInt
parseInt
Calendar//set
getToday
getDayOfWeek
Calendar//getIntance
Calender//get
printf
summary
CSMR-WCRE-2014 Era Track
12
13. Label and Summary
Label/Summary are “index” data of
search algorithm.
●
Label
–
–
●
Calendar//getIntance
A set of names put on a node
Keywords in a query
split
Summary
–
A node n’s summary S(n) is a set of
names of (child and) descendant
main
nodes of n.
Properties
–
–
getDay
parseInt
parseInt
parseInt
Calendar//set
getToday
Calendar//getIntance
getDayOfWeek
For any node n and its any child node c
printf
S(n) ⊇ S(c).
summary
A root node has a summary of local
{ “Calendar//getInstance”,
maximum.
Calender//get
“Calendar//set”,“split”, “parseInt” }
CSMR-WCRE-2014 Era Track
13
14. Label and Summary
Label/Summary are “index” data of
search algorithm.
●
Label
–
–
●
Calendar//getIntance
A set of names put on a node
Keywords in a query
split
Summary
–
getDay
A node n’s summary S(n) is a set of
names of (child and) descendant
main
nodes of n.
Properties
–
–
For any node n and its any child node c
S(n) ⊇ S(c).
A root node has a summary of local
maximum.
parseInt
parseInt
parseInt
Calendar//set
getToday
getDayOfWeek
Calendar//getIntance
Calender//get
printf
summary
{ “Calendar//getInstance”, “Calendar//get”,
“Calendar//set”, “getDay”, “getDayOfWeek”,
“split”, “parseInt”, “printf” }
CSMR-WCRE-2014 Era Track
14
15. Steps of search algorithm
(S1) finds query-fulfilling sub-trees of the (local)
maximum depths
–
by comparing summary of each node with the query
(S2) makes the shallowest treecut
–
by removing deeper leaf nodes until the treecut
does not fulfill the query anymore.
(S3) removes uncontributing leaf nodes
–
Uncontributing = its label does not match any of the
query keywords
CSMR-WCRE-2014 Era Track
15
16. Example
(S1) finds query-fulfilling
sub-trees of the (local) maximum
depths
Query
{ “Calender//get”,“Calender//set” }
Calendar//getIntance
(S2) makes the shallowest
treecut
split
getDay
(S3) removes uncontributing leaf
nodes
parseInt
parseInt
parseInt
Calendar//set
main
getToday
getDayOfWeek
Calendar//getIntance
Calender//get
printf
CSMR-WCRE-2014 Era Track
16
17. Example
(S1) finds query-fulfilling
sub-trees of the (local) maximum
depths
Query
{ “Calender//get”,“Calender//set” }
Calendar//getIntance
(S2) makes the shallowest
treecut
split
getDay
(S3) removes uncontributing leaf
nodes
parseInt
parseInt
parseInt
Calendar//set
main
getToday
getDayOfWeek
Calendar//getIntance
Calender//get
printf
{ “Calendar//getInstance”, “Calendar//get”,
“Calendar//set”, “getDay”, “getDayOfWeek”,
“split”, “parseInt”, “printf” }
CSMR-WCRE-2014 Era Track
17
18. Example
(S1) finds query-fulfilling
sub-trees of the (local) maximum
depths
Query
{ “Calender//get”,“Calender//set” }
Calendar//getIntance
(S2) makes the shallowest
treecut
split
getDay
(S3) removes uncontributing leaf
nodes
parseInt
parseInt
parseInt
Calendar//set
main
getToday
getDayOfWeek
Calendar//getIntance
Calender//get
printf
CSMR-WCRE-2014 Era Track
18
19. Example
(S1) finds query-fulfilling
sub-trees of the (local) maximum
depths
(S2) makes the shallowest
treecut in each of the sub-trees
Query
{ “Calender//get”,“Calender//set” }
getDay
Calendar//set
main
(S3) removes uncontributing leaf
nodes
getDayOfWeek
Search result
CSMR-WCRE-2014 Era Track
Calender//get
main {
getDay {
Calendar//set
}
getDayOfWeek {
Calendar//get
}
}
19
20. Prototype tool
Implementation
●
Target: Java source
code
–
●
●
Limitations
●
Keywords
–
Analysis of Java's
dynamic dispatch
Written in 8k lines of
Python
Applied up-to 183kloc
product (jEdit)
–
●
Exception handling
–
●
Names of class or method
Text in string literal
Does not search in the
execution paths that throw
Entry points
–
–
main() and static initializers
Does not search for entry
points such as @Test
CSMR-WCRE-2014 Era Track
20
21. Java class files
(bytecode)
Dynamic-dispatch analysis
Type hierarchy
Method-body analysis
Method calls
Control flow
Indexing
Method signature
Dynamic-dispatch resolver
And/Or/Call graph
of method body
Node label
Whole-program graph building
Node summary building
And/Or/Call
graph
Node
summary
Line number
table
Query
Searching
Keyword-query search
Sub-graph /
Execution path
Formatting
Search result
CSMR-WCRE-2014 Era Track
21
22. Applied to jEdit
●
H/W
–
–
●
Indexing
–
–
●
CPU Xeon E5520 2.27GHz
32GiB mem.
48.8 sec. in elapsed time
644 MiB peak mem.
Searching
–
–
3.09 ∼ 72.2 (ave. 5.71)
sec. in elapsed time
up-to 1412 MiB peak mem.
CSMR-WCRE-2014 Era Track
22
23. Summary
●
Background
–
–
●
●
Problem: Searching on fine-grained modules
Solution: Keyword search on an execution Path
–
–
●
#1: Code searching
#2: Emerging of fine-grained module technologies
And/Or/Call graph, Label/summary
Search algorithm
Prototype implementation
Applied to jEdit
●
GitHub
–
https://github.com/tos-kamiya/agoat/
CSMR-WCRE-2014 Era Track
23