SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
An Algorithm
for Keyword Search
on an Execution Path
Toshihiro Kamiya

Future University Hakodate
kamiya@fun.ac.jp
Background #1: Code searching
Developers do search!
➤ To find reusable components for a function of a product
➤ To find similar code fragments before modifying a code
➤ To find code samples showing usage a given class or
component

CSMR-WCRE-2014 Era Track

2
Background #2: Emerging
fine-grained module technologies
More and more fine-grained modules are used.
●

Object/Closure
extract a data and its manipulation

●

Aspect
extract interests, a set of code invoked by a specific
condition or event

●

Dependency Injection
split code at each dependency

CSMR-WCRE-2014 Era Track

3
Problem: Searching on fine-grained
modules
Code search becomes difficult by
fine-grained modules
(Old days) the search result was
contained in a file
↓
(Now) is a set of several parts of several
files

Old days

This affects code-search methods in both
●
Algorithm
–
●

Now

“how to find”

Displaying/Visualizing
–

“how to show search results”

CSMR-WCRE-2014 Era Track

4
Solution: Keyword Search on an
Execution Path
●
●

Static analysis
Find the execution paths that include given keywords
●

●

●

From all possible execution paths of a target program

Idea: a compact data structure (And/Or/Call graph) of
execution paths + search algorithm on it
A prototype implementation
●

applied to up-to 183k lines of Java source code

Related work
●
●

Prospector[8]
PARSEWeb[9]

CSMR-WCRE-2014 Era Track

5
And/Or/Call Graph
●

●

A DAG contains all execution
paths in a compact form

Source code

Repetitive structure
➡ Selection among sequences
of 0-time repetition, 1-time
repetition,2-times repetition, ...
➡ Or node having And nodes as
children

s3

Selection structure ➡ Or node

–

s2

Sequence structure ➡ And node

–

–

Method call ➡ Call node
●

Tex
s1

s1;
s2;
s3;

is generated by the following
translation rules
–

Graphical form

if (...) {
st;
} else {
se;
}

st
se

interface I { m(); }
class
m()
}
class
m()
}

B implements I {
{...}
C implements I {
{...}

I i;
...
i.m();

B//m
C//m

Dynamic dispatching

CSMR-WCRE-2014 Era Track

6
Example
12

Calendar//getIntance
split

10

getDay
8
列 1
2
列 3

Calendar//set

main
列

6

4

parseInt
parseInt
parseInt

getToday
getDayOfWeek

Calendar//getIntance
Calender//get

printf

2

0
行 1

行 2

行 3

行 4

CSMR-WCRE-2014 Era Track

7
Example
12

Calendar//getIntance
split

10

getDay
8
列 1
2
列 3

Calendar//set

main
列

6

4

parseInt
parseInt
parseInt

getToday
getDayOfWeek

Calendar//getIntance
Calender//get

printf

2

0
行 1

行 2

行 3

行 4

CSMR-WCRE-2014 Era Track

8
Example
12

Calendar//getIntance
split

10

getDay
8
列 1
2
列 3

Calendar//set

main
列

6

4

parseInt
parseInt
parseInt

getToday
getDayOfWeek

Calendar//getIntance
Calender//get

printf

2

0
行 1

行 2

行 3

行 4

CSMR-WCRE-2014 Era Track

9
Search Algorithm
●
●

Input: Keywords to identify nodes
Output: Connected sub-graphs including the
nodes identified with the keywords
“connected sub-graph” → continuous execution path

●

Heuristics
–

Find deepest nodes
← Assumption: small operation is easy to understand

–

Extract shallowest sub-graph(treecut)
← Assumption: deep method-invocation chain is difficult to
understand

CSMR-WCRE-2014 Era Track

10
Label and Summary
Label/Summary are “index” data of
search algorithm.
●
Label
–
–
●

Calendar//getIntance

A set of names put on a node
Keywords in a query

split

Summary
–

getDay

A node n’s summary S(n) is a set of
names of (child and) descendant
main
nodes of n.

Properties
–
–

For any node n and its any child node c
S(n) ⊇ S(c).
A root node has a summary of local
maximum.

parseInt
parseInt
parseInt
Calendar//set

getToday
getDayOfWeek

Calendar//getIntance
Calender//get

printf

CSMR-WCRE-2014 Era Track

11
Label and Summary
Label/Summary are “index” data of
search algorithm.
●
Label
–
–
●

Calendar//getIntance

A set of names put on a node
Keywords in a query

split

Summary
–

getDay

A node n’s summary S(n) is a set of
names of (child and) descendant
main
nodes of n.

Properties
–
–

For any node n and its any child node c
S(n) ⊇ S(c).
A root node has a summary of local
maximum.

parseInt
parseInt
parseInt
Calendar//set

getToday
getDayOfWeek

Calendar//getIntance
Calender//get

printf

summary

CSMR-WCRE-2014 Era Track

12
Label and Summary
Label/Summary are “index” data of
search algorithm.
●
Label
–
–
●

Calendar//getIntance

A set of names put on a node
Keywords in a query

split

Summary
–

A node n’s summary S(n) is a set of
names of (child and) descendant
main
nodes of n.

Properties
–
–

getDay

parseInt
parseInt
parseInt
Calendar//set

getToday

Calendar//getIntance

getDayOfWeek

For any node n and its any child node c
printf
S(n) ⊇ S(c).
summary
A root node has a summary of local
{ “Calendar//getInstance”,
maximum.

Calender//get

“Calendar//set”,“split”, “parseInt” }

CSMR-WCRE-2014 Era Track

13
Label and Summary
Label/Summary are “index” data of
search algorithm.
●
Label
–
–
●

Calendar//getIntance

A set of names put on a node
Keywords in a query

split

Summary
–

getDay

A node n’s summary S(n) is a set of
names of (child and) descendant
main
nodes of n.

Properties
–
–

For any node n and its any child node c
S(n) ⊇ S(c).
A root node has a summary of local
maximum.

parseInt
parseInt
parseInt
Calendar//set

getToday
getDayOfWeek

Calendar//getIntance
Calender//get

printf

summary

{ “Calendar//getInstance”, “Calendar//get”,
“Calendar//set”, “getDay”, “getDayOfWeek”,
“split”, “parseInt”, “printf” }

CSMR-WCRE-2014 Era Track

14
Steps of search algorithm
(S1) finds query-fulfilling sub-trees of the (local)
maximum depths
–

by comparing summary of each node with the query

(S2) makes the shallowest treecut
–

by removing deeper leaf nodes until the treecut
does not fulfill the query anymore.

(S3) removes uncontributing leaf nodes
–

Uncontributing = its label does not match any of the
query keywords

CSMR-WCRE-2014 Era Track

15
Example
(S1) finds query-fulfilling
sub-trees of the (local) maximum
depths

Query
{ “Calender//get”,“Calender//set” }
Calendar//getIntance

(S2) makes the shallowest
treecut

split
getDay

(S3) removes uncontributing leaf
nodes

parseInt
parseInt
parseInt
Calendar//set

main
getToday
getDayOfWeek

Calendar//getIntance
Calender//get

printf

CSMR-WCRE-2014 Era Track

16
Example
(S1) finds query-fulfilling
sub-trees of the (local) maximum
depths

Query
{ “Calender//get”,“Calender//set” }
Calendar//getIntance

(S2) makes the shallowest
treecut

split
getDay

(S3) removes uncontributing leaf
nodes

parseInt
parseInt
parseInt
Calendar//set

main
getToday
getDayOfWeek

Calendar//getIntance
Calender//get

printf

{ “Calendar//getInstance”, “Calendar//get”,
“Calendar//set”, “getDay”, “getDayOfWeek”,
“split”, “parseInt”, “printf” }
CSMR-WCRE-2014 Era Track

17
Example
(S1) finds query-fulfilling
sub-trees of the (local) maximum
depths

Query
{ “Calender//get”,“Calender//set” }
Calendar//getIntance

(S2) makes the shallowest
treecut

split
getDay

(S3) removes uncontributing leaf
nodes

parseInt
parseInt
parseInt
Calendar//set

main
getToday
getDayOfWeek

Calendar//getIntance
Calender//get

printf

CSMR-WCRE-2014 Era Track

18
Example
(S1) finds query-fulfilling
sub-trees of the (local) maximum
depths
(S2) makes the shallowest
treecut in each of the sub-trees

Query
{ “Calender//get”,“Calender//set” }
getDay
Calendar//set

main

(S3) removes uncontributing leaf
nodes

getDayOfWeek

Search result

CSMR-WCRE-2014 Era Track

Calender//get

main {
getDay {
Calendar//set
}
getDayOfWeek {
Calendar//get
}
}
19
Prototype tool
Implementation
●
Target: Java source
code
–

●

●

Limitations
●
Keywords
–

Analysis of Java's
dynamic dispatch

Written in 8k lines of
Python
Applied up-to 183kloc
product (jEdit)

–
●

Exception handling
–

●

Names of class or method
Text in string literal
Does not search in the
execution paths that throw

Entry points
–
–

main() and static initializers
Does not search for entry
points such as @Test

CSMR-WCRE-2014 Era Track

20
Java class files
(bytecode)

Dynamic-dispatch analysis
Type hierarchy

Method-body analysis
Method calls

Control flow

Indexing

Method signature
Dynamic-dispatch resolver

And/Or/Call graph
of method body
Node label

Whole-program graph building
Node summary building

And/Or/Call
graph

Node
summary

Line number
table

Query

Searching

Keyword-query search
Sub-graph /
Execution path

Formatting
Search result
CSMR-WCRE-2014 Era Track

21
Applied to jEdit
●

H/W
–
–

●

Indexing
–
–

●

CPU Xeon E5520 2.27GHz
32GiB mem.
48.8 sec. in elapsed time
644 MiB peak mem.

Searching
–
–

3.09 ∼ 72.2 (ave. 5.71)
sec. in elapsed time
up-to 1412 MiB peak mem.

CSMR-WCRE-2014 Era Track

22
Summary
●

Background
–
–

●
●

Problem: Searching on fine-grained modules
Solution: Keyword search on an execution Path
–
–

●

#1: Code searching
#2: Emerging of fine-grained module technologies

And/Or/Call graph, Label/summary
Search algorithm

Prototype implementation
Applied to jEdit

●

GitHub
–

https://github.com/tos-kamiya/agoat/

CSMR-WCRE-2014 Era Track

23

Más contenido relacionado

Similar a An Algorithm for Keyword Search on an Execution Path

Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFramesDatabricks
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFramesSpark Summit
 
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xOscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xshradha ambekar
 
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceMongoDB
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Kai Chan
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Databricks
 
Spline 0.3 and Plans for 0.4
Spline 0.3 and Plans for 0.4 Spline 0.3 and Plans for 0.4
Spline 0.3 and Plans for 0.4 Vaclav Kosar
 
Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfSease
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...DataStax Academy
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQLYousun Jeong
 
Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015dhiguero
 
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Sease
 
Strata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark communityStrata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark communityDatabricks
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
 
Spark Kafka summit 2017
Spark Kafka summit 2017Spark Kafka summit 2017
Spark Kafka summit 2017ajay_ei
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2Itamar Haber
 

Similar a An Algorithm for Keyword Search on an Execution Path (20)

Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
Spark tutorial
Spark tutorialSpark tutorial
Spark tutorial
 
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xOscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
 
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross Lawley
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 
Spline 0.3 and Plans for 0.4
Spline 0.3 and Plans for 0.4 Spline 0.3 and Plans for 0.4
Spline 0.3 and Plans for 0.4
 
Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdf
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015
 
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
 
Strata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark communityStrata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark community
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
 
Spark Kafka summit 2017
Spark Kafka summit 2017Spark Kafka summit 2017
Spark Kafka summit 2017
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2
 

Más de Kamiya Toshihiro

ソースコード推薦あるいは修正の情報源としての質問掲示板とソースコードレポジトリの比較
ソースコード推薦あるいは修正の情報源としての質問掲示板とソースコードレポジトリの比較ソースコード推薦あるいは修正の情報源としての質問掲示板とソースコードレポジトリの比較
ソースコード推薦あるいは修正の情報源としての質問掲示板とソースコードレポジトリの比較Kamiya Toshihiro
 
Code Difference Visualization by a Call Tree
Code Difference Visualization by a Call TreeCode Difference Visualization by a Call Tree
Code Difference Visualization by a Call TreeKamiya Toshihiro
 
実行トレース間のデータの差異に基づくデータフロー解析手法の提案
実行トレース間のデータの差異に基づくデータフロー解析手法の提案実行トレース間のデータの差異に基づくデータフロー解析手法の提案
実行トレース間のデータの差異に基づくデータフロー解析手法の提案Kamiya Toshihiro
 
コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~
コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~
コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~Kamiya Toshihiro
 
逆戻りデバッグ補助のための嵌入的スパイの試作
逆戻りデバッグ補助のための嵌入的スパイの試作逆戻りデバッグ補助のための嵌入的スパイの試作
逆戻りデバッグ補助のための嵌入的スパイの試作Kamiya Toshihiro
 
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis MethodsIntroducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis MethodsKamiya Toshihiro
 
任意粒度機能モデルコードクローン検出手法のリファクタリング理解への適用の試み
任意粒度機能モデルコードクローン検出手法のリファクタリング理解への適用の試み任意粒度機能モデルコードクローン検出手法のリファクタリング理解への適用の試み
任意粒度機能モデルコードクローン検出手法のリファクタリング理解への適用の試みKamiya Toshihiro
 
An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and ...
An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and ...An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and ...
An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and ...Kamiya Toshihiro
 
任意粒度機能モデルに基づく動的型付けプログラミング言語向けソースコード検索手法の提案
任意粒度機能モデルに基づく動的型付けプログラミング言語向けソースコード検索手法の提案任意粒度機能モデルに基づく動的型付けプログラミング言語向けソースコード検索手法の提案
任意粒度機能モデルに基づく動的型付けプログラミング言語向けソースコード検索手法の提案Kamiya Toshihiro
 
Web アプリケーションの UI 機能テストの ための HTML 構造パターンの抽出手法
Web アプリケーションの UI 機能テストの ための HTML 構造パターンの抽出手法Web アプリケーションの UI 機能テストの ための HTML 構造パターンの抽出手法
Web アプリケーションの UI 機能テストの ための HTML 構造パターンの抽出手法Kamiya Toshihiro
 
WebアプリケーションのUI機能テストのためのHTML構造パターンの提案
WebアプリケーションのUI機能テストのためのHTML構造パターンの提案WebアプリケーションのUI機能テストのためのHTML構造パターンの提案
WebアプリケーションのUI機能テストのためのHTML構造パターンの提案Kamiya Toshihiro
 
And/Or/Callグラフの提案とソースコード検索への応用
And/Or/Callグラフの提案とソースコード検索への応用And/Or/Callグラフの提案とソースコード検索への応用
And/Or/Callグラフの提案とソースコード検索への応用Kamiya Toshihiro
 
PBLへのアジャイル開発手法導入の試み
PBLへのアジャイル開発手法導入の試みPBLへのアジャイル開発手法導入の試み
PBLへのアジャイル開発手法導入の試みKamiya Toshihiro
 
任意粒度機能モデルに基づくコードクローン検出手法の大規模プログラムの適用に向けた改善
任意粒度機能モデルに基づくコードクローン検出手法の大規模プログラムの適用に向けた改善任意粒度機能モデルに基づくコードクローン検出手法の大規模プログラムの適用に向けた改善
任意粒度機能モデルに基づくコードクローン検出手法の大規模プログラムの適用に向けた改善Kamiya Toshihiro
 
任意粒度機能モデルに基づくバイトコードからのコードクローン検出手法
任意粒度機能モデルに基づくバイトコードからのコードクローン検出手法任意粒度機能モデルに基づくバイトコードからのコードクローン検出手法
任意粒度機能モデルに基づくバイトコードからのコードクローン検出手法Kamiya Toshihiro
 

Más de Kamiya Toshihiro (15)

ソースコード推薦あるいは修正の情報源としての質問掲示板とソースコードレポジトリの比較
ソースコード推薦あるいは修正の情報源としての質問掲示板とソースコードレポジトリの比較ソースコード推薦あるいは修正の情報源としての質問掲示板とソースコードレポジトリの比較
ソースコード推薦あるいは修正の情報源としての質問掲示板とソースコードレポジトリの比較
 
Code Difference Visualization by a Call Tree
Code Difference Visualization by a Call TreeCode Difference Visualization by a Call Tree
Code Difference Visualization by a Call Tree
 
実行トレース間のデータの差異に基づくデータフロー解析手法の提案
実行トレース間のデータの差異に基づくデータフロー解析手法の提案実行トレース間のデータの差異に基づくデータフロー解析手法の提案
実行トレース間のデータの差異に基づくデータフロー解析手法の提案
 
コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~
コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~
コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~
 
逆戻りデバッグ補助のための嵌入的スパイの試作
逆戻りデバッグ補助のための嵌入的スパイの試作逆戻りデバッグ補助のための嵌入的スパイの試作
逆戻りデバッグ補助のための嵌入的スパイの試作
 
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis MethodsIntroducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
Introducing Parameter Sensitivity to Dynamic Code-Clone Analysis Methods
 
任意粒度機能モデルコードクローン検出手法のリファクタリング理解への適用の試み
任意粒度機能モデルコードクローン検出手法のリファクタリング理解への適用の試み任意粒度機能モデルコードクローン検出手法のリファクタリング理解への適用の試み
任意粒度機能モデルコードクローン検出手法のリファクタリング理解への適用の試み
 
An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and ...
An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and ...An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and ...
An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and ...
 
任意粒度機能モデルに基づく動的型付けプログラミング言語向けソースコード検索手法の提案
任意粒度機能モデルに基づく動的型付けプログラミング言語向けソースコード検索手法の提案任意粒度機能モデルに基づく動的型付けプログラミング言語向けソースコード検索手法の提案
任意粒度機能モデルに基づく動的型付けプログラミング言語向けソースコード検索手法の提案
 
Web アプリケーションの UI 機能テストの ための HTML 構造パターンの抽出手法
Web アプリケーションの UI 機能テストの ための HTML 構造パターンの抽出手法Web アプリケーションの UI 機能テストの ための HTML 構造パターンの抽出手法
Web アプリケーションの UI 機能テストの ための HTML 構造パターンの抽出手法
 
WebアプリケーションのUI機能テストのためのHTML構造パターンの提案
WebアプリケーションのUI機能テストのためのHTML構造パターンの提案WebアプリケーションのUI機能テストのためのHTML構造パターンの提案
WebアプリケーションのUI機能テストのためのHTML構造パターンの提案
 
And/Or/Callグラフの提案とソースコード検索への応用
And/Or/Callグラフの提案とソースコード検索への応用And/Or/Callグラフの提案とソースコード検索への応用
And/Or/Callグラフの提案とソースコード検索への応用
 
PBLへのアジャイル開発手法導入の試み
PBLへのアジャイル開発手法導入の試みPBLへのアジャイル開発手法導入の試み
PBLへのアジャイル開発手法導入の試み
 
任意粒度機能モデルに基づくコードクローン検出手法の大規模プログラムの適用に向けた改善
任意粒度機能モデルに基づくコードクローン検出手法の大規模プログラムの適用に向けた改善任意粒度機能モデルに基づくコードクローン検出手法の大規模プログラムの適用に向けた改善
任意粒度機能モデルに基づくコードクローン検出手法の大規模プログラムの適用に向けた改善
 
任意粒度機能モデルに基づくバイトコードからのコードクローン検出手法
任意粒度機能モデルに基づくバイトコードからのコードクローン検出手法任意粒度機能モデルに基づくバイトコードからのコードクローン検出手法
任意粒度機能モデルに基づくバイトコードからのコードクローン検出手法
 

Último

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Último (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

An Algorithm for Keyword Search on an Execution Path

  • 1. An Algorithm for Keyword Search on an Execution Path Toshihiro Kamiya Future University Hakodate kamiya@fun.ac.jp
  • 2. Background #1: Code searching Developers do search! ➤ To find reusable components for a function of a product ➤ To find similar code fragments before modifying a code ➤ To find code samples showing usage a given class or component CSMR-WCRE-2014 Era Track 2
  • 3. Background #2: Emerging fine-grained module technologies More and more fine-grained modules are used. ● Object/Closure extract a data and its manipulation ● Aspect extract interests, a set of code invoked by a specific condition or event ● Dependency Injection split code at each dependency CSMR-WCRE-2014 Era Track 3
  • 4. Problem: Searching on fine-grained modules Code search becomes difficult by fine-grained modules (Old days) the search result was contained in a file ↓ (Now) is a set of several parts of several files Old days This affects code-search methods in both ● Algorithm – ● Now “how to find” Displaying/Visualizing – “how to show search results” CSMR-WCRE-2014 Era Track 4
  • 5. Solution: Keyword Search on an Execution Path ● ● Static analysis Find the execution paths that include given keywords ● ● ● From all possible execution paths of a target program Idea: a compact data structure (And/Or/Call graph) of execution paths + search algorithm on it A prototype implementation ● applied to up-to 183k lines of Java source code Related work ● ● Prospector[8] PARSEWeb[9] CSMR-WCRE-2014 Era Track 5
  • 6. And/Or/Call Graph ● ● A DAG contains all execution paths in a compact form Source code Repetitive structure ➡ Selection among sequences of 0-time repetition, 1-time repetition,2-times repetition, ... ➡ Or node having And nodes as children s3 Selection structure ➡ Or node – s2 Sequence structure ➡ And node – – Method call ➡ Call node ● Tex s1 s1; s2; s3; is generated by the following translation rules – Graphical form if (...) { st; } else { se; } st se interface I { m(); } class m() } class m() } B implements I { {...} C implements I { {...} I i; ... i.m(); B//m C//m Dynamic dispatching CSMR-WCRE-2014 Era Track 6
  • 10. Search Algorithm ● ● Input: Keywords to identify nodes Output: Connected sub-graphs including the nodes identified with the keywords “connected sub-graph” → continuous execution path ● Heuristics – Find deepest nodes ← Assumption: small operation is easy to understand – Extract shallowest sub-graph(treecut) ← Assumption: deep method-invocation chain is difficult to understand CSMR-WCRE-2014 Era Track 10
  • 11. Label and Summary Label/Summary are “index” data of search algorithm. ● Label – – ● Calendar//getIntance A set of names put on a node Keywords in a query split Summary – getDay A node n’s summary S(n) is a set of names of (child and) descendant main nodes of n. Properties – – For any node n and its any child node c S(n) ⊇ S(c). A root node has a summary of local maximum. parseInt parseInt parseInt Calendar//set getToday getDayOfWeek Calendar//getIntance Calender//get printf CSMR-WCRE-2014 Era Track 11
  • 12. Label and Summary Label/Summary are “index” data of search algorithm. ● Label – – ● Calendar//getIntance A set of names put on a node Keywords in a query split Summary – getDay A node n’s summary S(n) is a set of names of (child and) descendant main nodes of n. Properties – – For any node n and its any child node c S(n) ⊇ S(c). A root node has a summary of local maximum. parseInt parseInt parseInt Calendar//set getToday getDayOfWeek Calendar//getIntance Calender//get printf summary CSMR-WCRE-2014 Era Track 12
  • 13. Label and Summary Label/Summary are “index” data of search algorithm. ● Label – – ● Calendar//getIntance A set of names put on a node Keywords in a query split Summary – A node n’s summary S(n) is a set of names of (child and) descendant main nodes of n. Properties – – getDay parseInt parseInt parseInt Calendar//set getToday Calendar//getIntance getDayOfWeek For any node n and its any child node c printf S(n) ⊇ S(c). summary A root node has a summary of local { “Calendar//getInstance”, maximum. Calender//get “Calendar//set”,“split”, “parseInt” } CSMR-WCRE-2014 Era Track 13
  • 14. Label and Summary Label/Summary are “index” data of search algorithm. ● Label – – ● Calendar//getIntance A set of names put on a node Keywords in a query split Summary – getDay A node n’s summary S(n) is a set of names of (child and) descendant main nodes of n. Properties – – For any node n and its any child node c S(n) ⊇ S(c). A root node has a summary of local maximum. parseInt parseInt parseInt Calendar//set getToday getDayOfWeek Calendar//getIntance Calender//get printf summary { “Calendar//getInstance”, “Calendar//get”, “Calendar//set”, “getDay”, “getDayOfWeek”, “split”, “parseInt”, “printf” } CSMR-WCRE-2014 Era Track 14
  • 15. Steps of search algorithm (S1) finds query-fulfilling sub-trees of the (local) maximum depths – by comparing summary of each node with the query (S2) makes the shallowest treecut – by removing deeper leaf nodes until the treecut does not fulfill the query anymore. (S3) removes uncontributing leaf nodes – Uncontributing = its label does not match any of the query keywords CSMR-WCRE-2014 Era Track 15
  • 16. Example (S1) finds query-fulfilling sub-trees of the (local) maximum depths Query { “Calender//get”,“Calender//set” } Calendar//getIntance (S2) makes the shallowest treecut split getDay (S3) removes uncontributing leaf nodes parseInt parseInt parseInt Calendar//set main getToday getDayOfWeek Calendar//getIntance Calender//get printf CSMR-WCRE-2014 Era Track 16
  • 17. Example (S1) finds query-fulfilling sub-trees of the (local) maximum depths Query { “Calender//get”,“Calender//set” } Calendar//getIntance (S2) makes the shallowest treecut split getDay (S3) removes uncontributing leaf nodes parseInt parseInt parseInt Calendar//set main getToday getDayOfWeek Calendar//getIntance Calender//get printf { “Calendar//getInstance”, “Calendar//get”, “Calendar//set”, “getDay”, “getDayOfWeek”, “split”, “parseInt”, “printf” } CSMR-WCRE-2014 Era Track 17
  • 18. Example (S1) finds query-fulfilling sub-trees of the (local) maximum depths Query { “Calender//get”,“Calender//set” } Calendar//getIntance (S2) makes the shallowest treecut split getDay (S3) removes uncontributing leaf nodes parseInt parseInt parseInt Calendar//set main getToday getDayOfWeek Calendar//getIntance Calender//get printf CSMR-WCRE-2014 Era Track 18
  • 19. Example (S1) finds query-fulfilling sub-trees of the (local) maximum depths (S2) makes the shallowest treecut in each of the sub-trees Query { “Calender//get”,“Calender//set” } getDay Calendar//set main (S3) removes uncontributing leaf nodes getDayOfWeek Search result CSMR-WCRE-2014 Era Track Calender//get main { getDay { Calendar//set } getDayOfWeek { Calendar//get } } 19
  • 20. Prototype tool Implementation ● Target: Java source code – ● ● Limitations ● Keywords – Analysis of Java's dynamic dispatch Written in 8k lines of Python Applied up-to 183kloc product (jEdit) – ● Exception handling – ● Names of class or method Text in string literal Does not search in the execution paths that throw Entry points – – main() and static initializers Does not search for entry points such as @Test CSMR-WCRE-2014 Era Track 20
  • 21. Java class files (bytecode) Dynamic-dispatch analysis Type hierarchy Method-body analysis Method calls Control flow Indexing Method signature Dynamic-dispatch resolver And/Or/Call graph of method body Node label Whole-program graph building Node summary building And/Or/Call graph Node summary Line number table Query Searching Keyword-query search Sub-graph / Execution path Formatting Search result CSMR-WCRE-2014 Era Track 21
  • 22. Applied to jEdit ● H/W – – ● Indexing – – ● CPU Xeon E5520 2.27GHz 32GiB mem. 48.8 sec. in elapsed time 644 MiB peak mem. Searching – – 3.09 ∼ 72.2 (ave. 5.71) sec. in elapsed time up-to 1412 MiB peak mem. CSMR-WCRE-2014 Era Track 22
  • 23. Summary ● Background – – ● ● Problem: Searching on fine-grained modules Solution: Keyword search on an execution Path – – ● #1: Code searching #2: Emerging of fine-grained module technologies And/Or/Call graph, Label/summary Search algorithm Prototype implementation Applied to jEdit ● GitHub – https://github.com/tos-kamiya/agoat/ CSMR-WCRE-2014 Era Track 23