SlideShare una empresa de Scribd logo
1 de 15
Descargar para leer sin conexión
Topological methods




            Presented by:
            Sukhpal Singh
            Thapar University
Topological methods
Topological methods are based on the simple
premise that, given a query that describes
some required features, we are interested in
identifying library assets that come closest to
providing these features. Such methods are
critically dependent on what it means to come
closest, which in turn depends on some
definition of distance between the query and
candidate assets [1].
Categories of Topological methods
• Exclusive approximate retrieval: Methods that
  fall into this category make a distinction
  between two retrieval goals: exact retrieval and
  approximate retrieval, whereby we seek to
  identify library assets that completely satisfy all
  the requirements of the query.
• Inclusive approximate retrieval: Methods that
  fall into this category make no distinction
  between exact retrieval and approximate
  retrieval. Rather, they focus on identifying
  library assets that minimize some measure of
  distance to the query.
Measures of distance can be divided into two
                broad classes
• Measures of functional (semantic) distance,
  which reflect the extent of similarity between
  the functional properties of the query and those
  of candidate components.

• Measures of structural (syntactic) distance,
  which reflect the extent of similarity between
  the structure of (solutions to) the query and
  the structure of candidate components.
Characterizing topological methods
The Google PageRank Algorithm is
  used in Topological methods to
  retrieve a software assets from
        software repository.
What is PageRank?
• In short PageRank is a “vote”, by all the other
  pages on the Web, about how important a
  page is [3].
• A link to a page counts as a vote of support
• PR(A) = (1-d) + d(PR(T1)/C(T1)
  +…+PR(Tn)/C(Tn))
Breaking Down the Equation
• PR(Tn) - Each page has a notion of its own self-importance. That’s “PR(T1)”
  for the first page in the web all the way up to “PR(Tn)” for the last page

• C(Tn) - Each page spreads its vote out evenly amongst all of it’s outgoing
  links. The count, or number, of outgoing links for page 1 is “C(T1)”, “C(Tn)”
  for page n, and so on for all pages.

• PR(Tn)/C(Tn) - so if our page (page A) has a backlink from page “n” the
  share of the vote page A will get is “PR(Tn)/C(Tn)”

• d(… - All these fractions of votes are added together but, to stop the other
  pages having too much influence, this total vote is “damped down” by
  multiplying it by 0.85 (the factor “d”)

• (1 - d) - The (1 – d) bit at the beginning is a bit of probability math magic so
  the “sum of all web pages’ PageRank's will be one”: it adds in the bit lost
  by the d(…. It also means that if a page has no links to it (no backlinks) even
  then it will still get a small PR of 0.15 (i.e. 1 – 0.85).
How is it Calculated?
• The PR of each page depends on the PR of the
  pages pointing to it.
• But we won’t know what PR those pages have
  until the pages pointing to them have their PR
  calculated and so on.
• So what we do is make a guess.
Simple Example



• Each page has one outgoing link (backlink). So that
  means [2] :

• C(T1) = 1 for A
      and
• C(T2) = 1 for B
We don’t know what their PR should be to begin with, so we
         will just guess 1 as a safe random number.


• d (damping factor) = 0.85
• PR(A)= (1 – d) + d(PR(T1)/C(T1))= (1 – d) + d(1/1)

  i.e.

• PR(A)= 0.15 + 0.85 * 1
  =1
• PR(B)= 0.15 + 0.85 * 1
  =1
Let’s Do It Again with Another Number. Let’s try 0 and re-
                            calculate…
• PR(A)= 0.15 + 0.85 * 0
      = 0.15
      = 0.15 + 0.85 *
• PR(B) 0.15
      = 0.2775
• Now we have calculated a “next best guess” so we just plug it in the
  equation again…

• PR(A)= 0.15 + 0.85 * 0.2775
  = 0.385875
• PR(B)= 0.15 + 0.85 * 0.385875
  = 0.47799375

And again…
• PR(A)= 0.15 + 0.85 * 0.47799375
  = 0.5562946875
• PR(B)= 0.15 + 0.85 * 0.5562946875
  = 0.622850484375
Principle
• It doesn’t matter where you start your guess,
  once the PageRank calculations have settled
  down, the “normalized probability
  distribution” (the average PageRank for all
  pages) will be 1.0
• In software repository we are using software
  assets instead of pages and also using
  relationships among software assets based on
  their keywords instead of links.
Summary
References:
[1]   A survey of software reuse libraries A. Mili a,_, R. Mili
      b and R.T. Mittermeir Annals of Software
      Engineering 5 (1998) 349–414 349

[2]   http://wwwdb.stanford.edu/~backrub/google.html
      http://www-db.stanford.edu/~backrub/google.html

[3]   Semantic Component Retrieval in Software
      Engineering Inaugural dissertation zur Erlangung des
      akademischen       Grades eines Doktors der
      Naturwissenschaften der, Universitat Mannheim,
      Mannheim, 2008

Más contenido relacionado

La actualidad más candente

Spherule Diagrams: A Matrix-based Set Visualization Compared with Euler Diagrams
Spherule Diagrams: A Matrix-based Set Visualization Compared with Euler DiagramsSpherule Diagrams: A Matrix-based Set Visualization Compared with Euler Diagrams
Spherule Diagrams: A Matrix-based Set Visualization Compared with Euler DiagramsMithileysh Sathiyanarayanan
 
Spherule Diagrams with Graph for Social Network Visualization
Spherule Diagrams with Graph for Social Network VisualizationSpherule Diagrams with Graph for Social Network Visualization
Spherule Diagrams with Graph for Social Network VisualizationMithileysh Sathiyanarayanan
 
Data Structure Assignment help , Data Structure Online tutors
Data Structure Assignment help , Data Structure Online tutorsData Structure Assignment help , Data Structure Online tutors
Data Structure Assignment help , Data Structure Online tutorsjohn mayer
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Henock Beyene
 
Contextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsContextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsMatthias Braunhofer
 

La actualidad más candente (6)

Spherule Diagrams: A Matrix-based Set Visualization Compared with Euler Diagrams
Spherule Diagrams: A Matrix-based Set Visualization Compared with Euler DiagramsSpherule Diagrams: A Matrix-based Set Visualization Compared with Euler Diagrams
Spherule Diagrams: A Matrix-based Set Visualization Compared with Euler Diagrams
 
Spherule Diagrams with Graph for Social Network Visualization
Spherule Diagrams with Graph for Social Network VisualizationSpherule Diagrams with Graph for Social Network Visualization
Spherule Diagrams with Graph for Social Network Visualization
 
Lecture6 pca
Lecture6 pcaLecture6 pca
Lecture6 pca
 
Data Structure Assignment help , Data Structure Online tutors
Data Structure Assignment help , Data Structure Online tutorsData Structure Assignment help , Data Structure Online tutors
Data Structure Assignment help , Data Structure Online tutors
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
 
Contextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender SystemsContextual Information Elicitation in Travel Recommender Systems
Contextual Information Elicitation in Travel Recommender Systems
 

Destacado

How to Write an Effective Research Paper
How to Write an Effective Research PaperHow to Write an Effective Research Paper
How to Write an Effective Research PaperDr Sukhpal Singh Gill
 
Reduction of Blocking Artifacts In JPEG Compressed Image
Reduction of Blocking Artifacts In JPEG Compressed ImageReduction of Blocking Artifacts In JPEG Compressed Image
Reduction of Blocking Artifacts In JPEG Compressed ImageDr Sukhpal Singh Gill
 
Java.NET: Integration of Java and .NET
Java.NET: Integration of Java and .NETJava.NET: Integration of Java and .NET
Java.NET: Integration of Java and .NETDr Sukhpal Singh Gill
 
If you know nothing about HTML, this is where you can start !!
If you know nothing about HTML, this is where you can start !!If you know nothing about HTML, this is where you can start !!
If you know nothing about HTML, this is where you can start !!Dr Sukhpal Singh Gill
 
Reduction of Blocking Artifacts In JPEG Compressed Image
 Reduction of Blocking Artifacts In JPEG Compressed Image Reduction of Blocking Artifacts In JPEG Compressed Image
Reduction of Blocking Artifacts In JPEG Compressed ImageDr Sukhpal Singh Gill
 
GREEN CLOUD COMPUTING-A Data Center Approach
GREEN CLOUD COMPUTING-A Data Center ApproachGREEN CLOUD COMPUTING-A Data Center Approach
GREEN CLOUD COMPUTING-A Data Center ApproachDr Sukhpal Singh Gill
 
Workshop on Basics of Software Engineering (DFD, UML and Project Culture)
Workshop on Basics of Software Engineering (DFD, UML and Project Culture)Workshop on Basics of Software Engineering (DFD, UML and Project Culture)
Workshop on Basics of Software Engineering (DFD, UML and Project Culture)Dr Sukhpal Singh Gill
 
Software Requirements Specification (SRS) for Online Tower Plotting System (O...
Software Requirements Specification (SRS) for Online Tower Plotting System (O...Software Requirements Specification (SRS) for Online Tower Plotting System (O...
Software Requirements Specification (SRS) for Online Tower Plotting System (O...Dr Sukhpal Singh Gill
 
Case Study Based Software Engineering Project Development: State of Art
Case Study Based Software Engineering Project Development: State of ArtCase Study Based Software Engineering Project Development: State of Art
Case Study Based Software Engineering Project Development: State of ArtDr Sukhpal Singh Gill
 

Destacado (14)

How to Write an Effective Research Paper
How to Write an Effective Research PaperHow to Write an Effective Research Paper
How to Write an Effective Research Paper
 
Reduction of Blocking Artifacts In JPEG Compressed Image
Reduction of Blocking Artifacts In JPEG Compressed ImageReduction of Blocking Artifacts In JPEG Compressed Image
Reduction of Blocking Artifacts In JPEG Compressed Image
 
The reuse capability model
The reuse capability modelThe reuse capability model
The reuse capability model
 
Network Topologies
Network TopologiesNetwork Topologies
Network Topologies
 
Java.NET: Integration of Java and .NET
Java.NET: Integration of Java and .NETJava.NET: Integration of Java and .NET
Java.NET: Integration of Java and .NET
 
If you know nothing about HTML, this is where you can start !!
If you know nothing about HTML, this is where you can start !!If you know nothing about HTML, this is where you can start !!
If you know nothing about HTML, this is where you can start !!
 
Reduction of Blocking Artifacts In JPEG Compressed Image
 Reduction of Blocking Artifacts In JPEG Compressed Image Reduction of Blocking Artifacts In JPEG Compressed Image
Reduction of Blocking Artifacts In JPEG Compressed Image
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
GREEN CLOUD COMPUTING-A Data Center Approach
GREEN CLOUD COMPUTING-A Data Center ApproachGREEN CLOUD COMPUTING-A Data Center Approach
GREEN CLOUD COMPUTING-A Data Center Approach
 
Workshop on Basics of Software Engineering (DFD, UML and Project Culture)
Workshop on Basics of Software Engineering (DFD, UML and Project Culture)Workshop on Basics of Software Engineering (DFD, UML and Project Culture)
Workshop on Basics of Software Engineering (DFD, UML and Project Culture)
 
Software Requirements Specification (SRS) for Online Tower Plotting System (O...
Software Requirements Specification (SRS) for Online Tower Plotting System (O...Software Requirements Specification (SRS) for Online Tower Plotting System (O...
Software Requirements Specification (SRS) for Online Tower Plotting System (O...
 
Case Study Based Software Engineering Project Development: State of Art
Case Study Based Software Engineering Project Development: State of ArtCase Study Based Software Engineering Project Development: State of Art
Case Study Based Software Engineering Project Development: State of Art
 
Software Requirement Specification
Software Requirement SpecificationSoftware Requirement Specification
Software Requirement Specification
 
Constructors and Destructors
Constructors and DestructorsConstructors and Destructors
Constructors and Destructors
 

Similar a Topological methods

Similar a Topological methods (20)

Page rank1
Page rank1Page rank1
Page rank1
 
Analysis Of Algorithm
Analysis Of AlgorithmAnalysis Of Algorithm
Analysis Of Algorithm
 
Dm page rank
Dm page rankDm page rank
Dm page rank
 
How Google Works
How Google WorksHow Google Works
How Google Works
 
Page rank algortihm
Page rank algortihmPage rank algortihm
Page rank algortihm
 
Implementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduceImplementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduce
 
Page rank2
Page rank2Page rank2
Page rank2
 
PageRank
PageRankPageRank
PageRank
 
Search engine page rank demystification
Search engine page rank demystificationSearch engine page rank demystification
Search engine page rank demystification
 
Local Approximation of PageRank
Local Approximation of PageRankLocal Approximation of PageRank
Local Approximation of PageRank
 
BigData - PageRank Algorithm with Scala and Spark
BigData - PageRank Algorithm with Scala and SparkBigData - PageRank Algorithm with Scala and Spark
BigData - PageRank Algorithm with Scala and Spark
 
PageRank & Searching
PageRank & SearchingPageRank & Searching
PageRank & Searching
 
Optimizing search engines
Optimizing search enginesOptimizing search engines
Optimizing search engines
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
nueva
nuevanueva
nueva
 

Último

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Último (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

Topological methods

  • 1. Topological methods Presented by: Sukhpal Singh Thapar University
  • 2. Topological methods Topological methods are based on the simple premise that, given a query that describes some required features, we are interested in identifying library assets that come closest to providing these features. Such methods are critically dependent on what it means to come closest, which in turn depends on some definition of distance between the query and candidate assets [1].
  • 3. Categories of Topological methods • Exclusive approximate retrieval: Methods that fall into this category make a distinction between two retrieval goals: exact retrieval and approximate retrieval, whereby we seek to identify library assets that completely satisfy all the requirements of the query. • Inclusive approximate retrieval: Methods that fall into this category make no distinction between exact retrieval and approximate retrieval. Rather, they focus on identifying library assets that minimize some measure of distance to the query.
  • 4. Measures of distance can be divided into two broad classes • Measures of functional (semantic) distance, which reflect the extent of similarity between the functional properties of the query and those of candidate components. • Measures of structural (syntactic) distance, which reflect the extent of similarity between the structure of (solutions to) the query and the structure of candidate components.
  • 6. The Google PageRank Algorithm is used in Topological methods to retrieve a software assets from software repository.
  • 7. What is PageRank? • In short PageRank is a “vote”, by all the other pages on the Web, about how important a page is [3]. • A link to a page counts as a vote of support • PR(A) = (1-d) + d(PR(T1)/C(T1) +…+PR(Tn)/C(Tn))
  • 8. Breaking Down the Equation • PR(Tn) - Each page has a notion of its own self-importance. That’s “PR(T1)” for the first page in the web all the way up to “PR(Tn)” for the last page • C(Tn) - Each page spreads its vote out evenly amongst all of it’s outgoing links. The count, or number, of outgoing links for page 1 is “C(T1)”, “C(Tn)” for page n, and so on for all pages. • PR(Tn)/C(Tn) - so if our page (page A) has a backlink from page “n” the share of the vote page A will get is “PR(Tn)/C(Tn)” • d(… - All these fractions of votes are added together but, to stop the other pages having too much influence, this total vote is “damped down” by multiplying it by 0.85 (the factor “d”) • (1 - d) - The (1 – d) bit at the beginning is a bit of probability math magic so the “sum of all web pages’ PageRank's will be one”: it adds in the bit lost by the d(…. It also means that if a page has no links to it (no backlinks) even then it will still get a small PR of 0.15 (i.e. 1 – 0.85).
  • 9. How is it Calculated? • The PR of each page depends on the PR of the pages pointing to it. • But we won’t know what PR those pages have until the pages pointing to them have their PR calculated and so on. • So what we do is make a guess.
  • 10. Simple Example • Each page has one outgoing link (backlink). So that means [2] : • C(T1) = 1 for A and • C(T2) = 1 for B
  • 11. We don’t know what their PR should be to begin with, so we will just guess 1 as a safe random number. • d (damping factor) = 0.85 • PR(A)= (1 – d) + d(PR(T1)/C(T1))= (1 – d) + d(1/1) i.e. • PR(A)= 0.15 + 0.85 * 1 =1 • PR(B)= 0.15 + 0.85 * 1 =1
  • 12. Let’s Do It Again with Another Number. Let’s try 0 and re- calculate… • PR(A)= 0.15 + 0.85 * 0 = 0.15 = 0.15 + 0.85 * • PR(B) 0.15 = 0.2775 • Now we have calculated a “next best guess” so we just plug it in the equation again… • PR(A)= 0.15 + 0.85 * 0.2775 = 0.385875 • PR(B)= 0.15 + 0.85 * 0.385875 = 0.47799375 And again… • PR(A)= 0.15 + 0.85 * 0.47799375 = 0.5562946875 • PR(B)= 0.15 + 0.85 * 0.5562946875 = 0.622850484375
  • 13. Principle • It doesn’t matter where you start your guess, once the PageRank calculations have settled down, the “normalized probability distribution” (the average PageRank for all pages) will be 1.0 • In software repository we are using software assets instead of pages and also using relationships among software assets based on their keywords instead of links.
  • 15. References: [1] A survey of software reuse libraries A. Mili a,_, R. Mili b and R.T. Mittermeir Annals of Software Engineering 5 (1998) 349–414 349 [2] http://wwwdb.stanford.edu/~backrub/google.html http://www-db.stanford.edu/~backrub/google.html [3] Semantic Component Retrieval in Software Engineering Inaugural dissertation zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften der, Universitat Mannheim, Mannheim, 2008