SlideShare una empresa de Scribd logo
1 de 15
Descargar para leer sin conexión
Google PageRank

          By
    Abhijit Mondal

  Software Engineer
   HolidayIQ.com
What is PageRank ?
   It's the algorithm developed by Google
  founders Larry 'Page' and Sergey Brin to
quantify the importance of a 'page' or website
in the complex network of the world wide web
PageRank is only a criteria and not the only
criteria by which Google decides where your
page or website will rank in the search results
From the inception of PageRank, the search
results ranking algorithms have developed so
much that at this moment nobody knows what
 is the exact algorithm (algorithms) Google
        uses to rank search results.


 If it was known there would be no need for
               SEO experts
Assume that world wide web is composed of
  only 4 pages which looks like a directed
graph where the arrows indicate a hyperlink
         from one page to another
Assuming that all the hyperlinks in a page
have equal probability of being clicked (which
 is not true) then an edge weight is given as
 fraction of the total outgoing links from that
                      page
Loosely speaking PageRank of a page A is a
 direct measure of the probability of visiting
  page A when a random user opens up a
browser and follows some hyperlinks to reach
                   page A
In the given graph what is the probability of
reaching page 3 when a random user opens
     up the browser to surf the internet ?
How can the user reach page 3 ?

He is on page 1 then clicks link of page 3
                    Or
He is on page 2 then clicks link of page 3
                    Or
He is on page 4 then clicks link of page 3
                    Or
       Directly types url of page 3
Denoting the probability of reaching page i as
                 P(i), then
 P(3) = (1-d) x (P(1)x(1/3) + P(2)x(1/2) + P(4)x(1/2)) + d x (1/4)


     This formula follows from the laws of
                  probability

Where 'd' is the probability that user directly
visits a page, hence (1-d) is the probability
 that user comes through a different page.
… Similarly the equations for P(1), P(2) and
                   P(4) are

      P(1) = (1-d) x (P(3) + P(4)x(1/2)) + d x (1/4)

         P(2) = (1-d) x (P(1)x(1/3)) + d x (1/4)

   P(4) = (1-d) x (P(2)x(1/2) + P(1)x(1/3)) + d x (1/4)

But we now have a problem, if we already do not know
what is P(1) and we need P(3) to compute it, then how
        can P(3) be computed using P(1) ???

These are coupled equations and solved using Matrices
 (eigenvalues and eigenvectors) or more simply using
      repeated iterations till the values converge
But why calculates probabilities when we
want PageRank ? Because the probability of
 reaching page i is the direct measure of the
  PageRank of i. Letting PR(i) = P(i) where
       PR(i) is the PageRank of page i.


Denoting PRk(i) as the PageRank computed
using the earlier formula in the kth iteration, in
            the (k+1)th iteration ...
PRk+1(3) = (1-d) x (PRk(1)x(1/3) + PRk(2)x(1/2) +
                PRk(4)x(1/2)) + d x (1/4)

  PRk+1(1) = (1-d) x (PRk(3) + PRk(4)x(1/2)) + d x (1/4)

       PRk+1(2) = (1-d) x (PRk(1)x(1/3)) + d x (1/4)

  PRk+1(4) = (1-d) x (PRk(2)x(1/2) + PRk(1)x(1/3)) + d x
                           (1/4)

                 Letting d=0.15 and
       PR0(1) = PR0(2) = PR0(3) = PR0(4) = 0.25

Compute PRk(i) for each k until |PRk+1(i) – PRk(i)| < ɛ for
all i =1, 2, 3, 4, where ɛ is some very small real number
Computing the PageRanks of each page
        using the above formula:

                PR(1) = 0.368
                PR(2) = 0.142
                PR(3) = 0.288
                PR(4) = 0.202

    Thus page 1 is the page with highest
PageRank. Surprising since page 3 receives
 the most backlinks (from 1, 2 and 4), but 1
  receives backlink from 3 and page 3 only
gives backlink to page 1, thus 'informing' that
          page 1 is really important
What are the conclusions from the above
                results ?

More number of backlinks, better PageRank

Backlinks from pages with high PageRanks
   themselves improves my PageRank

    If there are good many backlinks from
 Wikipedia or some university website like
iitk.ac.in to my site HolidayIQ.com then my
         PageRank will always improve

Más contenido relacionado

Destacado

The Google Pagerank algorithm - How does it work?
The Google Pagerank algorithm - How does it work?The Google Pagerank algorithm - How does it work?
The Google Pagerank algorithm - How does it work?Kundan Bhaduri
 
Page rank algorithm
Page rank algorithmPage rank algorithm
Page rank algorithmJunghoon Kim
 
PageRank Algorithm In data mining
PageRank Algorithm In data miningPageRank Algorithm In data mining
PageRank Algorithm In data miningMai Mustafa
 
Pagerank Algorithm Explained
Pagerank Algorithm ExplainedPagerank Algorithm Explained
Pagerank Algorithm Explainedjdhaar
 
Kgigroep luchlezing rochdale 2 d
Kgigroep luchlezing rochdale 2 dKgigroep luchlezing rochdale 2 d
Kgigroep luchlezing rochdale 2 dQuietroom Label
 
Grammar book 2013 spring
Grammar book 2013 springGrammar book 2013 spring
Grammar book 2013 springraquel63485
 
FYI - Gems' Art Show 2015
FYI - Gems' Art Show 2015FYI - Gems' Art Show 2015
FYI - Gems' Art Show 2015Melvin Thambi
 
2014 Android and iOS Design Trends
2014 Android and iOS Design Trends2014 Android and iOS Design Trends
2014 Android and iOS Design TrendsMelvin Thambi
 
Basic conversation christmas
Basic conversation christmasBasic conversation christmas
Basic conversation christmasLes Davy
 
Burgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 hBurgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 hMieke Sanden, van der
 
Screencasting: Best Practice
Screencasting: Best PracticeScreencasting: Best Practice
Screencasting: Best PracticeGareth Graham
 
Denk Modulair, Denk Lego
Denk Modulair, Denk LegoDenk Modulair, Denk Lego
Denk Modulair, Denk LegoIde Koops
 
True maisha service presentation
True maisha service presentationTrue maisha service presentation
True maisha service presentationEric Chrispin
 
Meltwater Buzz Service Overview
Meltwater Buzz Service OverviewMeltwater Buzz Service Overview
Meltwater Buzz Service Overviewammit0724
 
The civil war, lincoln, lee
The civil war, lincoln, leeThe civil war, lincoln, lee
The civil war, lincoln, leems_faris
 

Destacado (20)

The Google Pagerank algorithm - How does it work?
The Google Pagerank algorithm - How does it work?The Google Pagerank algorithm - How does it work?
The Google Pagerank algorithm - How does it work?
 
Page rank algorithm
Page rank algorithmPage rank algorithm
Page rank algorithm
 
Google PageRank
Google PageRankGoogle PageRank
Google PageRank
 
PageRank Algorithm In data mining
PageRank Algorithm In data miningPageRank Algorithm In data mining
PageRank Algorithm In data mining
 
Pagerank Algorithm Explained
Pagerank Algorithm ExplainedPagerank Algorithm Explained
Pagerank Algorithm Explained
 
Kgigroep luchlezing rochdale 2 d
Kgigroep luchlezing rochdale 2 dKgigroep luchlezing rochdale 2 d
Kgigroep luchlezing rochdale 2 d
 
Grammar book 2013 spring
Grammar book 2013 springGrammar book 2013 spring
Grammar book 2013 spring
 
FYI - Gems' Art Show 2015
FYI - Gems' Art Show 2015FYI - Gems' Art Show 2015
FYI - Gems' Art Show 2015
 
Ici 119 (1)
Ici 119 (1)Ici 119 (1)
Ici 119 (1)
 
Notam 21 mai
Notam 21 maiNotam 21 mai
Notam 21 mai
 
Production diary new
Production diary newProduction diary new
Production diary new
 
2014 Android and iOS Design Trends
2014 Android and iOS Design Trends2014 Android and iOS Design Trends
2014 Android and iOS Design Trends
 
Basic conversation christmas
Basic conversation christmasBasic conversation christmas
Basic conversation christmas
 
Burgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 hBurgers tonen lef masterthese definitief 3 h
Burgers tonen lef masterthese definitief 3 h
 
Screencasting: Best Practice
Screencasting: Best PracticeScreencasting: Best Practice
Screencasting: Best Practice
 
Denk Modulair, Denk Lego
Denk Modulair, Denk LegoDenk Modulair, Denk Lego
Denk Modulair, Denk Lego
 
Shortcodes
ShortcodesShortcodes
Shortcodes
 
True maisha service presentation
True maisha service presentationTrue maisha service presentation
True maisha service presentation
 
Meltwater Buzz Service Overview
Meltwater Buzz Service OverviewMeltwater Buzz Service Overview
Meltwater Buzz Service Overview
 
The civil war, lincoln, lee
The civil war, lincoln, leeThe civil war, lincoln, lee
The civil war, lincoln, lee
 

Similar a Pagerank (20)

Pr
PrPr
Pr
 
Page rank algortihm
Page rank algortihmPage rank algortihm
Page rank algortihm
 
Search engine page rank demystification
Search engine page rank demystificationSearch engine page rank demystification
Search engine page rank demystification
 
Dm page rank
Dm page rankDm page rank
Dm page rank
 
PageRank & Searching
PageRank & SearchingPageRank & Searching
PageRank & Searching
 
Optimizing search engines
Optimizing search enginesOptimizing search engines
Optimizing search engines
 
Google page rank
Google page rankGoogle page rank
Google page rank
 
Implementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduceImplementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduce
 
Pagerank
Pagerank Pagerank
Pagerank
 
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCHLINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
 
Link Analysis
Link AnalysisLink Analysis
Link Analysis
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
 
PageRank Algorithm
PageRank AlgorithmPageRank Algorithm
PageRank Algorithm
 
Ranking Web Pages
Ranking Web PagesRanking Web Pages
Ranking Web Pages
 
Google page rank
Google page rankGoogle page rank
Google page rank
 
PageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_HabibPageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_Habib
 
I04015559
I04015559I04015559
I04015559
 
Page Rank Link Farm Detection
Page Rank Link Farm DetectionPage Rank Link Farm Detection
Page Rank Link Farm Detection
 
BigData - PageRank Algorithm with Scala and Spark
BigData - PageRank Algorithm with Scala and SparkBigData - PageRank Algorithm with Scala and Spark
BigData - PageRank Algorithm with Scala and Spark
 
The 400 Million Dollar Algorithm - Copy
The 400 Million Dollar Algorithm - CopyThe 400 Million Dollar Algorithm - Copy
The 400 Million Dollar Algorithm - Copy
 

Más de Abhijit Mondal

Mysql Performance Optimization Indexing Algorithms and Data Structures
Mysql Performance Optimization Indexing Algorithms and Data StructuresMysql Performance Optimization Indexing Algorithms and Data Structures
Mysql Performance Optimization Indexing Algorithms and Data StructuresAbhijit Mondal
 
MySQL Performance Optimization
MySQL Performance OptimizationMySQL Performance Optimization
MySQL Performance OptimizationAbhijit Mondal
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key CryptographyAbhijit Mondal
 
Number Theory for Security
Number Theory for SecurityNumber Theory for Security
Number Theory for SecurityAbhijit Mondal
 

Más de Abhijit Mondal (8)

Poster Presentation
Poster PresentationPoster Presentation
Poster Presentation
 
Mysql Performance Optimization Indexing Algorithms and Data Structures
Mysql Performance Optimization Indexing Algorithms and Data StructuresMysql Performance Optimization Indexing Algorithms and Data Structures
Mysql Performance Optimization Indexing Algorithms and Data Structures
 
MySQL Performance Optimization
MySQL Performance OptimizationMySQL Performance Optimization
MySQL Performance Optimization
 
My MSc. Project
My MSc. ProjectMy MSc. Project
My MSc. Project
 
Security protocols
Security protocolsSecurity protocols
Security protocols
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key Cryptography
 
Number Theory for Security
Number Theory for SecurityNumber Theory for Security
Number Theory for Security
 
Quantum games
Quantum gamesQuantum games
Quantum games
 

Pagerank

  • 1. Google PageRank By Abhijit Mondal Software Engineer HolidayIQ.com
  • 2. What is PageRank ? It's the algorithm developed by Google founders Larry 'Page' and Sergey Brin to quantify the importance of a 'page' or website in the complex network of the world wide web
  • 3. PageRank is only a criteria and not the only criteria by which Google decides where your page or website will rank in the search results
  • 4. From the inception of PageRank, the search results ranking algorithms have developed so much that at this moment nobody knows what is the exact algorithm (algorithms) Google uses to rank search results. If it was known there would be no need for SEO experts
  • 5. Assume that world wide web is composed of only 4 pages which looks like a directed graph where the arrows indicate a hyperlink from one page to another
  • 6. Assuming that all the hyperlinks in a page have equal probability of being clicked (which is not true) then an edge weight is given as fraction of the total outgoing links from that page
  • 7. Loosely speaking PageRank of a page A is a direct measure of the probability of visiting page A when a random user opens up a browser and follows some hyperlinks to reach page A
  • 8. In the given graph what is the probability of reaching page 3 when a random user opens up the browser to surf the internet ?
  • 9. How can the user reach page 3 ? He is on page 1 then clicks link of page 3 Or He is on page 2 then clicks link of page 3 Or He is on page 4 then clicks link of page 3 Or Directly types url of page 3
  • 10. Denoting the probability of reaching page i as P(i), then P(3) = (1-d) x (P(1)x(1/3) + P(2)x(1/2) + P(4)x(1/2)) + d x (1/4) This formula follows from the laws of probability Where 'd' is the probability that user directly visits a page, hence (1-d) is the probability that user comes through a different page.
  • 11. … Similarly the equations for P(1), P(2) and P(4) are P(1) = (1-d) x (P(3) + P(4)x(1/2)) + d x (1/4) P(2) = (1-d) x (P(1)x(1/3)) + d x (1/4) P(4) = (1-d) x (P(2)x(1/2) + P(1)x(1/3)) + d x (1/4) But we now have a problem, if we already do not know what is P(1) and we need P(3) to compute it, then how can P(3) be computed using P(1) ??? These are coupled equations and solved using Matrices (eigenvalues and eigenvectors) or more simply using repeated iterations till the values converge
  • 12. But why calculates probabilities when we want PageRank ? Because the probability of reaching page i is the direct measure of the PageRank of i. Letting PR(i) = P(i) where PR(i) is the PageRank of page i. Denoting PRk(i) as the PageRank computed using the earlier formula in the kth iteration, in the (k+1)th iteration ...
  • 13. PRk+1(3) = (1-d) x (PRk(1)x(1/3) + PRk(2)x(1/2) + PRk(4)x(1/2)) + d x (1/4) PRk+1(1) = (1-d) x (PRk(3) + PRk(4)x(1/2)) + d x (1/4) PRk+1(2) = (1-d) x (PRk(1)x(1/3)) + d x (1/4) PRk+1(4) = (1-d) x (PRk(2)x(1/2) + PRk(1)x(1/3)) + d x (1/4) Letting d=0.15 and PR0(1) = PR0(2) = PR0(3) = PR0(4) = 0.25 Compute PRk(i) for each k until |PRk+1(i) – PRk(i)| < ɛ for all i =1, 2, 3, 4, where ɛ is some very small real number
  • 14. Computing the PageRanks of each page using the above formula: PR(1) = 0.368 PR(2) = 0.142 PR(3) = 0.288 PR(4) = 0.202 Thus page 1 is the page with highest PageRank. Surprising since page 3 receives the most backlinks (from 1, 2 and 4), but 1 receives backlink from 3 and page 3 only gives backlink to page 1, thus 'informing' that page 1 is really important
  • 15. What are the conclusions from the above results ? More number of backlinks, better PageRank Backlinks from pages with high PageRanks themselves improves my PageRank If there are good many backlinks from Wikipedia or some university website like iitk.ac.in to my site HolidayIQ.com then my PageRank will always improve