Evaluating Recommended Applications

Evaluating Recommended
Applications

Mark Grechanik
Accenture Technology Labs
David Coppit
Denys Poshyvanyk
The College of William and Mary
Denys Poshyvanyk

How Many Open Source Applications Are There?

• Sourceforge.net reports that they host
180,000 projects as of August 1, 2008.

• There are dozens of other open source
repositories containing tens of
thousands of different applications.

• Companies have internal source
control management systems
containing hundreds of thousands of
applications.

2

Problem

• Finding/checking existing software
matching high-level user requirements
• Would reduce the cost of many software
projects
• Would provide users with examples of
different implementations

• Challenges:
• Finding relevant applications is difficult
• Evaluating recommended applications is very
difficult
3

What Search Engines Do

Matching
keywords

descriptions
of apps

4

What Search Engines Do

Matching
keywords

descriptions
of apps

5

Fundamental Problems

• Vocabulary problem
• Mismatch between the high-level intent reflected in
the descriptions of applications and their low-level
implementation details
• Concept assignment problem
Code snippet implementing
High level concept
“Send data”
“Send data” s = socket.socket(proto, socket.SOCK_DGRAM)
s.sendto(teststring, addr)
buf = data = receive(s, 100)
while data and 'n' not in buf:
data = receive(s, 100) buf += data

• Many application repositories are polluted with
poorly functioning projects
6

Working without a Recommender

• Find relevant application (s)
• Download application
• Locate and examine fragments of the code that
implement the desired features
• Observe the runtime behavior of this application
to ensure that this behavior matches
requirements
• This process is manual since programmers:
• study the source code of the retrieved applications
• locate various API calls
• read information about these calls in help documents
• Still, it is difficult for programmers to link high-
level concepts from requirements to their
implementations in source code
7

Key observations

• While studying retrieved apps developers :
• locate various API calls
• read information about these calls in help documents
• Help docs are supplied by the same vendors
whose packages/APIs are used in software
• Programmers read and rely on these API docs
• Help docs are written and reviewed by many
developers
• Help documents are usually more verbose and
accurate than project descriptions

10

How K9 Recommender System Works

API call1

descriptions API call2
match
of API calls

API call3

 Automatically matching words in user queries against API
help docs instead of:
 searching in project descriptions;
 searching in source code.
 K9 uses help documents to produce a list of relevant API calls

11

The Architecture behind K9

API calls API call Integrity
API calls
Dictionary lookup Engine

Applications
Help page Recommended
User metadata
processor applications

Help pages Source code Retrieved
Static
search engine applications
Analyzer

12

API Call Lookup (ACL)

• After the user enters keywords, the ACL
searches through the documentation text to find
these keywords
• Use Information Retrieval to rank the
documents to the user query
• The ACL returns the absolute path to the
programming entity for the text that contains
entered keywords

13

The Architecture behind K9

API calls API call Integrity
API calls
Dictionary lookup Engine

Applications
Help page Recommended
User metadata
processor applications

Help pages Source code Retrieved
Static
search engine applications
Analyzer

14

Integrity Engine

• We assume that keywords in queries represent
logically connected concepts
• Consider query “compress encrypt XML”
• Queries are often formed by constructing a question
first, and then extracting keywords from this
sentence
• If a relation is present between concepts in the
query, then there should be a relation between
API calls that implement these concepts in the
program code
• This heuristics is based on the observation that
relations between concepts are often preserved
in their implementations in the program code
15

Computing the Ranking

• Total number of API calls is n
• Total number of connections between these API calls is
less or equal to n(n-1)
• Each connection i is assign some weight wi
• Weak connections have weights 0.5 and strong connections
have weights 1.0
• l is the connectivity value
• 1 : loop link, 0.5 :single link, 0 : no link
• Rank is computed using the following formula

wi li
rank 
n  n  1
16

Current Status

• Restricting the scope to Java projects
• Challenges:
• How to automatically locate and download the latest
version of the software (e.g., from sourceforge)?
• How to automatically locate the correct entry point
(i.e., main) for static analysis?
• How to reduce the time for the static analysis?
• How and when to update API call dictionary?
• Testing other ranking heuristics
• Evaluation is pending

17

Related Work

• CodeFinder/Helgon
• ExpertFinder
• CodeBroker
• Hipikat
• Automated Method Completion (AMC)
• Strathcona
• Prospector
• XSnippet
• Google code search, Krugle,…

18

Conclusions & Future Work

• K9 recommends/checks relevant applications
based on:
• the analysis of relevant API help documents;
• the connectivity analysis of retrieved API calls.

• Indexing available open-source projects and
pre-computing data and control flow among
API calls

• Analyzing multiple releases of the same project

19

Evaluating Recommended Applications

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Evaluating Recommended Applications

Similar to Evaluating Recommended Applications (20)

Recently uploaded

Recently uploaded (20)

Evaluating Recommended Applications