Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Evaluating Recommended Applications
1. Evaluating Recommended
Applications
Mark Grechanik
Accenture Technology Labs
David Coppit
Denys Poshyvanyk
The College of William and Mary
The College of William and Mary
Denys Poshyvanyk
The College of William and Mary
2. How Many Open Source Applications Are There?
• Sourceforge.net reports that they host
180,000 projects as of August 1, 2008.
• There are dozens of other open source
repositories containing tens of
thousands of different applications.
• Companies have internal source
control management systems
containing hundreds of thousands of
applications.
2
3. Problem
• Finding/checking existing software
matching high-level user requirements
• Would reduce the cost of many software
projects
• Would provide users with examples of
different implementations
• Challenges:
• Finding relevant applications is difficult
• Evaluating recommended applications is very
difficult
3
6. Fundamental Problems
• Vocabulary problem
• Mismatch between the high-level intent reflected in
the descriptions of applications and their low-level
implementation details
• Concept assignment problem
Code snippet implementing
High level concept
“Send data”
“Send data” s = socket.socket(proto, socket.SOCK_DGRAM)
s.sendto(teststring, addr)
buf = data = receive(s, 100)
while data and 'n' not in buf:
data = receive(s, 100) buf += data
• Many application repositories are polluted with
poorly functioning projects
6
7. Working without a Recommender
• Find relevant application (s)
• Download application
• Locate and examine fragments of the code that
implement the desired features
• Observe the runtime behavior of this application
to ensure that this behavior matches
requirements
• This process is manual since programmers:
• study the source code of the retrieved applications
• locate various API calls
• read information about these calls in help documents
• Still, it is difficult for programmers to link high-
level concepts from requirements to their
implementations in source code
7
10. Key observations
• While studying retrieved apps developers :
• locate various API calls
• read information about these calls in help documents
• Help docs are supplied by the same vendors
whose packages/APIs are used in software
• Programmers read and rely on these API docs
• Help docs are written and reviewed by many
developers
• Help documents are usually more verbose and
accurate than project descriptions
10
11. How K9 Recommender System Works
API call1
descriptions API call2
match
of API calls
API call3
Automatically matching words in user queries against API
help docs instead of:
searching in project descriptions;
searching in source code.
K9 uses help documents to produce a list of relevant API calls
11
12. The Architecture behind K9
API calls API call Integrity
API calls
Dictionary lookup Engine
Applications
Help page Recommended
User metadata
processor applications
Help pages Source code Retrieved
Static
search engine applications
Analyzer
12
13. API Call Lookup (ACL)
• After the user enters keywords, the ACL
searches through the documentation text to find
these keywords
• Use Information Retrieval to rank the
documents to the user query
• The ACL returns the absolute path to the
programming entity for the text that contains
entered keywords
13
14. The Architecture behind K9
API calls API call Integrity
API calls
Dictionary lookup Engine
Applications
Help page Recommended
User metadata
processor applications
Help pages Source code Retrieved
Static
search engine applications
Analyzer
14
15. Integrity Engine
• We assume that keywords in queries represent
logically connected concepts
• Consider query “compress encrypt XML”
• Queries are often formed by constructing a question
first, and then extracting keywords from this
sentence
• If a relation is present between concepts in the
query, then there should be a relation between
API calls that implement these concepts in the
program code
• This heuristics is based on the observation that
relations between concepts are often preserved
in their implementations in the program code
15
16. Computing the Ranking
• Total number of API calls is n
• Total number of connections between these API calls is
less or equal to n(n-1)
• Each connection i is assign some weight wi
• Weak connections have weights 0.5 and strong connections
have weights 1.0
• l is the connectivity value
• 1 : loop link, 0.5 :single link, 0 : no link
• Rank is computed using the following formula
wi li
rank
n n 1
16
17. Current Status
• Restricting the scope to Java projects
• Challenges:
• How to automatically locate and download the latest
version of the software (e.g., from sourceforge)?
• How to automatically locate the correct entry point
(i.e., main) for static analysis?
• How to reduce the time for the static analysis?
• How and when to update API call dictionary?
• Testing other ranking heuristics
• Evaluation is pending
17
18. Related Work
• CodeFinder/Helgon
• ExpertFinder
• CodeBroker
• Hipikat
• Automated Method Completion (AMC)
• Strathcona
• Prospector
• XSnippet
• Google code search, Krugle,…
18
19. Conclusions & Future Work
• K9 recommends/checks relevant applications
based on:
• the analysis of relevant API help documents;
• the connectivity analysis of retrieved API calls.
• Indexing available open-source projects and
pre-computing data and control flow among
API calls
• Analyzing multiple releases of the same project
19