SlideShare una empresa de Scribd logo
1 de 12
PageRank & Searching
                             Rahul Bindra, Aditya Nigudkar
                           B.E, Computer Science, 2nd Year
                   Madhav Institute of Technology and Science, Gwalior
                                 rahulbindra@gmail.com
                                   adimits@gmail.com


                                          ABSTRACT

Information on the web is increasing exponentially and so is the number of people seeking it. In order
to waste minimum time in searching for queries, we use search engines almost for every bit of
information on web. There have been many search engines in the past like AltaVista, Inktomi,
Overture etc. but they failed to rank their results. In late 90’s Sergey Brin and Larry Page led
GOOGLE revolutionized the search engine industry when it came up with its PageRank system to
rank the search results which helped GOOGLE to change from a struggling 2-member search
amateur to a multi-billion dollar search enterprise. The paper aims at giving an insight on the pros
and cons of a PageRank system and how it changed the meaning of searching for billions of people
across the globe.


                                          1. Introduction

PageRank was developed at Stanford University by Larry Page (hence the name Page-Rank) and
Sergey Brin as part of a research project about a new kind of search engine. The project started in
1995 and led to a functional prototype, named Google, in 1998. Shortly after, Page and Brin founded
Google Inc., the company behind the Google search engine. While just one of many factors which
determine the ranking of Google search results, PageRank continues to provide the basis for all of
Google's web search tools.




PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an
indicator of an individual page's value. PageRank actually works on an intuitive system, which works
as a model of a web surfer’s behaviour. It works on the probability that a web page will be randomly
accessed by a web surfer. It also takes into consideration the pages that link to the page. Using Yahoo
as an example, the justification of this is that if a page like Yahoo were to link directly to another page,
it is very likely that the page is of high quality (Brin, S., Page, L., 2000).

                                             2. PageRank

                                       2.1 What is PageRank ?

PageRank is a link analysis algorithm which assigns a numerical weighting to each element of a
hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its
relative importance within the set. The algorithm may be applied to any collection of entities with
reciprocal quotations and references. The numerical weight that it assigns to any given element E is
also called the PageRank of E and denoted by PR(E).
Google figures that when one page links to another page, it is effectively casting a vote for the other
page. The more votes that are cast for a page, the more important the page must be. Also, the
importance of the page that is casting the vote determines how important the vote itself is. Google
calculates a page's importance from the votes cast for it. How important each vote is taken into
account when a page's PageRank is calculated. PageRank is Google's method of measuring a page's
"importance." When all other factors such as Title tag and keywords are taken into account, Google
uses PageRank to adjust results so that sites that are deemed more "important" will move up in the
results page of a user's search accordingly. A basic overview of how Google ranks pages in their
search engine results pages (SERPS) follows:
1) Find all pages matching the keywords of the search.
2) Rank accordingly using "on the page factors" such as keywords.
3) Calculate in the inbound anchor text.
4) Adjust the results by PageRank scores.

                                2.2 How is PageRank Calculated ?

Quoting from the original Google paper, PageRank is defined like this:
We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a
damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details
about d in the next section. Also C(A) is defined as the number of links going out of page A. The
PageRank of a page A is given as follows:

                       PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages'
PageRanks will be one.
PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the
principal eigenvector of the normalized link matrix of the web.
but that’s not too helpful so let’s break it down into sections.

PR(Tn) - Each page has a notion of its own self-importance. That’s “PR(T1)” for the first page in the
web all the way up to “PR(Tn)” for the last page.
C(Tn) - Each page spreads its vote out evenly amongst all of it’s outgoing links. The count, or
number, of outgoing links for page 1 is “C(T1)”, “C(Tn)” for page n, and so on for all pages.

PR(Tn)/C(Tn) - so if our page (page A) has a backlink from page “n” the share of the vote page A
will get is “PR(Tn)/C(Tn)”.

d(... - All these fractions of votes are added together but, to stop the other pages having too much
influence, this total vote is “damped down” by multiplying it by 0.85 (the factor “d”).

(1 - d) - The (1 – d) bit at the beginning is a bit of probability math magic so the “sum of all web
pages' PageRanks will be one”: it adds in the bit lost by the d(.... It also means that if a page has no
links to it (no backlinks) even then it will still get a small PR of 0.15 (i.e. 1 – 0.85). (Aside: the Google
paper says “the sum of all pages” but they mean the “the normalised sum” – otherwise known as “the
average” to you and me.
We can think of it in a simpler way:-
a page's PageRank = 0.15 + 0.85 * (a "share" of the PageRank of every page that links to it)
"share" = the linking page's PageRank divided by the number of outbound links on the page.

                                        2.3 Need Of Iterations

The equation above clearly shows how a page's PageRank is arrived at. But what isn't immediately
obvious is that it can't work if the calculation is done just once. Suppose we have 2 pages, A and B,
which link to each other, and neither have any other links of any kind. This is what happens:-

Step 1: Calculate page A's PageRank from the value of its inbound links

Page A now has a new PageRank value. The calculation used the value of the inbound link from page
B. But page B has an inbound link (from page A) and its new PageRank value hasn't been worked out
yet, so page A's new PageRank value is based on inaccurate data and can't be accurate.

Step 2: Calculate page B's PageRank from the value of its inbound links

Page B now has a new PageRank value, but it can't be accurate because the calculation used the new
PageRank value of the inbound link from page A, which is inaccurate.
It's a Catch 22 situation. We can't work out A's PageRank until we know B's PageRank, and we can't
work out B's PageRank until we know A's PageRank.

The problem is overcome by repeating the calculations many times. Each time produces slightly more
accurate values. In fact, total accuracy can never be achieved because the calculations are always
based on inaccurate values. 40 to 50 iterations are sufficient to reach a point where any further
iterations wouldn't produce enough of a change to the values to matter. This is precisely what Google
does at each update, and it's the reason why the updates take so long.

                                              2.4 Examples
1. Let's consider a 3 page site (pages A, B and C) with no links coming in from the outside. We will
allocate each page an initial PageRank of 1, although it makes no difference whether we start each
page with 1, 0 or 99. Apart from a few millionths of a PageRank point, after many iterations the end
result is always the same. Starting with 1 requires fewer iterations for the PageRanks to converge to a
suitable result than when starting with 0 or any other number.


                                    A                      B
                                                          PR 1
                                   PR 1



                                               C
                                              PR 1


The site's maximum PageRank is the amount of PageRank in the site. In this case, we have 3 pages so
the site's maximum is 3. At the moment, none of the pages link to any other pages and none link to
them. If you make the calculation once for each page, you'll find that each of them ends up with a
PageRank of 0.15. No matter how many iterations you run, each page's PageRank remains at 0.15. The
total PageRank in the site = 0.45, whereas it could be 3. The site is seriously wasting most of its
potential PageRank.


2. Now, adding some links to above example. Start again with PR1 all round. After 1iteration:-
Page A = 1.425
Page B = 1
Page C = 0.575

By comparison to the 1 iteration figures in the previous example, page A has lost some PageRank,
page B has gained some and page C stayed the same. Page C now shares its "vote" between A and B.
Previously A received all of it. That's why page A has lost out and why page B has gained. and after
100 iterations:-

Page A = 1.298245
Page B = 0.9999999
Page C = 0.7017543
A                      B
                                                         PR 1
                                   PR 1



                                              C
                                             PR 1



3. Another example involving multiple links is as follows:




The calculated PR values are depicted in the following diagram:




As you’d expect, the home page has the most PR – after all, it has the most incoming links! But what’s
happened to the average? It’s only 0.378. Take a look at the “external site” pages – what’s happening
to their PageRank? They’re not passing it on, they’re not voting for anyone, they’re wasting their PR.
A possible remedy to this problem is suggested in the following figure:




Thus it can be clearly inferred from the above example that the architectural interlinking of a site with
other sites plays a very important role in determining its “Average PR” which helps it to improve its
position in GOOGLE’s search results.

                                      2.5 PageRank Feedback

Another important aspect of PageRank is "PageRank Feedback." Pages linking to each other can
create a feedback effect that can increase the PageRank of those pages.
Let's say that page A currently links to nowhere. If we add a link from page A to page B, then page A
is saying that page B is important. This means the measure of page B's votes is also increased. Page B
is now saying that the pages it links to are more important than they otherwise would be. So the
measure of those page's votes will be increased.... and so on (with the pages they link to through the
link structure). The effect is diluted as it moves down through the links. If we could point our web
browser at page B and through clicking on
the links, get to page A, then so could the Google algorithm (at least one of the pages linking to page
A has become more important). If a page linking to page A is more important, then so is its vote, and
subsequently page A becomes more important! So by linking to page B, page A has made itself more
important, thus creating PageRank Feedback.
An interesting thing about PR Feedback is that you can use it to your advantage via the internal
navigational structure of your site. It is very important to keep as much PageRank within your site as
possible. Therefore, only link out to other sites from low PageRank pages of your site.

                                3. Optimization Of PageRank
There are three fundamental areas to look at when trying to optimize the PageRank for your site:

1. The links you choose to have link to you, i.e., which ones you choose, and how much effort you put
in to getting them.

2. Who you choose to link out to from your site, and from which page of your site you place their link.
(Maximizing PageRank Feedback and minimizing PageRank leakage).

3. The internal navigational structure and linkage of your pages, in order to best distribute PageRank
within your site.

                                      3.1 Links To Your Site

When looking for links to your site, from a purely PageRank point of view, one might think you
should simply look for pages that have the highest Toolbar PageRank. However, this way of thinking
is incorrect. As more and more people try to get links from only high PageRank sites, it becomes less
and less of a winning proposition.
The actual PageRank from an individual page is shared out amongst the links on that page. So, a link
from a page with a PageRank of 4 might be better than a link from a page with a PageRank of 6 if
there are less total links on the PR 4 page.

                                    3.2 Links Out Of Your Site

When considering links out of your site there is one golden rule:

               Generally, you will want to keep PageRank within your own site.

This does not mean that you will lose PageRank from your site by linking out, but that the total
PageRank within it may be lower than it could have been had you not linked out.
The best outbound link PageRank scenario occurs when the outbound link comes from a page that has
both:
a) A low PageRank
b) A lot of links to pages on your site.

How can this best be achieved? One way would be by writing reviews of the sites we link out to on a
separate page of our site, and by providing a link to those reviews along with each hyperlink to the
external site. We will have to make sure that the review page also links back to a page in our own site
that is high up its structure. (It’s best if this is our home page, but any important page will do.) By
doing this, we’ve significantly reduced the amount of PageRank we’ve let out of our site. We’ve
targeted the distribution of PageRank to the home page to ensure that less is passed back through our
links page (which would be a wasted opportunity), and more is put elsewhere in our site. This concept
can be illustrated diagrammatically as follows:
If we perform the same calculations but include the review pages, here’s what we get:




Without the Review Pages:
The HomePage PR is: 0.9536152797
B,C,D PR is: 0.4201909959
Total is: 2.2141882674
With the Review Pages:
The HomePage PR is: 2.439718935
B,C,D PR is: 0.8412536982
Total is: 4.9634800296

Thus it could be clearly observed from the above results that adding review pager in an architectural
linking helps to improve its PR by keeping the PR within the architecture.

                              3.3 Internal Structures And Linkages

To get a high PageRank, it is not enough to have tens of thousands of pages. Those pages must also be
in Google’s index. To achieve this they must contain enough content for Google’s algorithms to
consider them worthy of being added to the index. As you develop content for your site, you are also
creating more PageRank for your site. Its hard work, and you’re creating it slowly – but if you’re also
creating pages that people will want to link to, then you’re doing yourself two favors at once:
PageRank from both directions. Or, to put it in very basic terms: The best “internal” thing that can be
done to build PageRank is to write lots of good content. Ensure that pages aren’t overly short or overly
long and break the content into several pages where necessary. There are three different ways in which
pages can be interlinked within a site:

                                            Hierarchical




                                  Total PageRank in this site is 4




                                              Looping
Total PageRank in this site is 4

                                        Extensive Interlinking




                                     Total PageRank in this site is 4

The maximum benefit from PageRank is derived when this methodology is applied towards pages that
want to rank high for highly competitive keyword phrases, or towards pages that must compete on a
large number of keyword phrases. The Extensive Interlinking strategy retains the most PageRank
within the site. Following this is the Hierarchical strategy and lastly the Looping strategy.

                                 4. Significance Of PageRank
Of course, important pages mean nothing to you if they don't match your query. So, Google combines
PageRank with sophisticated text-matching techniques to find pages that are both important and
relevant to your search. Google goes far beyond the number of times a term appears on a page and
examines all aspects of the page's content (and the content of the pages linking to it) to determine if it's
a good match for your query. Thus, in this way, PageRank helps GOOGLE to bring the best ranked
search results corresponding to your query.
A version of PageRank has recently been proposed as a replacement for the traditional ISI impact
factor. Instead of merely counting citations of a journal, the "quality" of a citation is determined in a
PageRank fashion.
A Web crawler may use PageRank as one of a number of importance metrics it uses to determine
which URL to visit next during a crawl of the web. One of the early working papers which was used in
the creation of Google is “Efficient crawling through URL ordering”, which discusses the use of a
number of different importance metrics to determine how deeply, and how much of a site Google will
crawl. PageRank is presented as one of a number of these importance metrics, though there are others
listed such as the number of inbound and outbound links for a URL, and the distance from the root
directory on a site to the URL.

                                           5. Conclusion
To sum up, PageRank is a technique that has revolutionized the concept of searching for every web
surfer across the globe putting the whole worldly knowledge just a click away for anyone. It has
assisted in unification of cultures and sharing of ideas by providing an easier, faster and more relevant
way to answer to the query of a user. This technique, along with GOOGLE’s huge database of
information, has emerged to be a one-stop point for every user around the world with any type of
query.
6. References

1. David A. Vise and Mark Malseed, “The Google Story”, Delacorte Press.

2. Nancy Blachman, “The Google Guide”

3. http://www.google.com/support/librariancenter/Article 12-2005-1

4. http://www.google.com/technology/pigeonrank.html

5. http://infolab.stanford.edu/~backrub/google.html

6. Sergey Brin and Lawrence Page, ”The Anatomy of a search engine”.

7. http://www.centaurhosting.com /Understanding Google's Page Rank System.htm

8. Chris Ridings and Mike Shishigin, “PageRank Uncovered”.

9. http://www.WebWorkshop.com/ Pagerank Explained_ Google's PageRank and how to make the
most of it.htm

10. Google Support Center, “Google Technology”.

11. Wikipedia, “How PageRank Works”.

12. Ian Rogers, “The Google Pagerank Algorithm and How It Works”.

Más contenido relacionado

La actualidad más candente

Seo & Internet Marketing
Seo & Internet MarketingSeo & Internet Marketing
Seo & Internet Marketingalafrancis
 
Moz 2013 Ranking Factors - Matt Peters MozCon
Moz 2013 Ranking Factors - Matt Peters MozConMoz 2013 Ranking Factors - Matt Peters MozCon
Moz 2013 Ranking Factors - Matt Peters MozConmattthemathman
 
14 Steps to Successful SEO
14 Steps to Successful SEO14 Steps to Successful SEO
14 Steps to Successful SEORyan Spoon
 
Optimize Your Bottom Line
Optimize Your Bottom LineOptimize Your Bottom Line
Optimize Your Bottom LineInflow
 
10 Steps to Great SEO: InfusionCon 2011
10 Steps to Great SEO: InfusionCon 201110 Steps to Great SEO: InfusionCon 2011
10 Steps to Great SEO: InfusionCon 2011Rand Fishkin
 
SEO presentation for marketing summit 2017
SEO presentation for marketing summit 2017SEO presentation for marketing summit 2017
SEO presentation for marketing summit 2017Scott True
 
My Panda / Penguin Algorithm Recovery Story - Cheers, Tears and Jeers
My Panda / Penguin Algorithm Recovery Story - Cheers, Tears and JeersMy Panda / Penguin Algorithm Recovery Story - Cheers, Tears and Jeers
My Panda / Penguin Algorithm Recovery Story - Cheers, Tears and Jeers Michael Jones
 
How to Master SEO in 2017
How to Master SEO in 2017How to Master SEO in 2017
How to Master SEO in 2017Digital Vidya
 
Analytics For Local Search
Analytics For Local SearchAnalytics For Local Search
Analytics For Local SearchInflow
 
SEO 2020: On-SERP SEO & Beyond
SEO 2020: On-SERP SEO & BeyondSEO 2020: On-SERP SEO & Beyond
SEO 2020: On-SERP SEO & BeyondDiane Kulseth
 
From Blackhat to Whitehat - Benj Arriola
From Blackhat to Whitehat - Benj ArriolaFrom Blackhat to Whitehat - Benj Arriola
From Blackhat to Whitehat - Benj ArriolaSean Si
 
SearchLeeds 2018 - Jon Myers - DeepCrawl - The Mobile First Index, what, why ...
SearchLeeds 2018 - Jon Myers - DeepCrawl - The Mobile First Index, what, why ...SearchLeeds 2018 - Jon Myers - DeepCrawl - The Mobile First Index, what, why ...
SearchLeeds 2018 - Jon Myers - DeepCrawl - The Mobile First Index, what, why ...Branded3
 
ID2013 - Optimizing Your Website’s Architecture For SEO
ID2013 - Optimizing Your Website’s Architecture For SEOID2013 - Optimizing Your Website’s Architecture For SEO
ID2013 - Optimizing Your Website’s Architecture For SEOJohn Doherty
 
Optimize Your Bottom line: What You Need To Know About SEO
Optimize Your Bottom line: What You Need To Know About SEOOptimize Your Bottom line: What You Need To Know About SEO
Optimize Your Bottom line: What You Need To Know About SEOInflow
 
SEO Challenges in 2013 by Navneet Kaushal
SEO Challenges in 2013 by Navneet KaushalSEO Challenges in 2013 by Navneet Kaushal
SEO Challenges in 2013 by Navneet KaushalNavneet Kaushal
 
Internet Intelligence
Internet IntelligenceInternet Intelligence
Internet Intelligencerobcrayford
 
Schema and Open Graph 101 - SMX Munich
Schema and Open Graph 101 - SMX MunichSchema and Open Graph 101 - SMX Munich
Schema and Open Graph 101 - SMX MunichMatthew Brown
 
Crawling, indexation & the impact on performance | Brighton SEO
Crawling, indexation & the impact on performance | Brighton SEOCrawling, indexation & the impact on performance | Brighton SEO
Crawling, indexation & the impact on performance | Brighton SEOMartin Sean Fennon
 

La actualidad más candente (20)

Seo & Internet Marketing
Seo & Internet MarketingSeo & Internet Marketing
Seo & Internet Marketing
 
Moz 2013 Ranking Factors - Matt Peters MozCon
Moz 2013 Ranking Factors - Matt Peters MozConMoz 2013 Ranking Factors - Matt Peters MozCon
Moz 2013 Ranking Factors - Matt Peters MozCon
 
14 Steps to Successful SEO
14 Steps to Successful SEO14 Steps to Successful SEO
14 Steps to Successful SEO
 
Optimize Your Bottom Line
Optimize Your Bottom LineOptimize Your Bottom Line
Optimize Your Bottom Line
 
10 Steps to Great SEO: InfusionCon 2011
10 Steps to Great SEO: InfusionCon 201110 Steps to Great SEO: InfusionCon 2011
10 Steps to Great SEO: InfusionCon 2011
 
SEO presentation for marketing summit 2017
SEO presentation for marketing summit 2017SEO presentation for marketing summit 2017
SEO presentation for marketing summit 2017
 
My Panda / Penguin Algorithm Recovery Story - Cheers, Tears and Jeers
My Panda / Penguin Algorithm Recovery Story - Cheers, Tears and JeersMy Panda / Penguin Algorithm Recovery Story - Cheers, Tears and Jeers
My Panda / Penguin Algorithm Recovery Story - Cheers, Tears and Jeers
 
On site optimization
On site optimizationOn site optimization
On site optimization
 
How to Master SEO in 2017
How to Master SEO in 2017How to Master SEO in 2017
How to Master SEO in 2017
 
Analytics For Local Search
Analytics For Local SearchAnalytics For Local Search
Analytics For Local Search
 
SEO 2020: On-SERP SEO & Beyond
SEO 2020: On-SERP SEO & BeyondSEO 2020: On-SERP SEO & Beyond
SEO 2020: On-SERP SEO & Beyond
 
From Blackhat to Whitehat - Benj Arriola
From Blackhat to Whitehat - Benj ArriolaFrom Blackhat to Whitehat - Benj Arriola
From Blackhat to Whitehat - Benj Arriola
 
SearchLeeds 2018 - Jon Myers - DeepCrawl - The Mobile First Index, what, why ...
SearchLeeds 2018 - Jon Myers - DeepCrawl - The Mobile First Index, what, why ...SearchLeeds 2018 - Jon Myers - DeepCrawl - The Mobile First Index, what, why ...
SearchLeeds 2018 - Jon Myers - DeepCrawl - The Mobile First Index, what, why ...
 
ID2013 - Optimizing Your Website’s Architecture For SEO
ID2013 - Optimizing Your Website’s Architecture For SEOID2013 - Optimizing Your Website’s Architecture For SEO
ID2013 - Optimizing Your Website’s Architecture For SEO
 
Optimize Your Bottom line: What You Need To Know About SEO
Optimize Your Bottom line: What You Need To Know About SEOOptimize Your Bottom line: What You Need To Know About SEO
Optimize Your Bottom line: What You Need To Know About SEO
 
SEO Challenges in 2013 by Navneet Kaushal
SEO Challenges in 2013 by Navneet KaushalSEO Challenges in 2013 by Navneet Kaushal
SEO Challenges in 2013 by Navneet Kaushal
 
Next Generation SEO
Next Generation SEONext Generation SEO
Next Generation SEO
 
Internet Intelligence
Internet IntelligenceInternet Intelligence
Internet Intelligence
 
Schema and Open Graph 101 - SMX Munich
Schema and Open Graph 101 - SMX MunichSchema and Open Graph 101 - SMX Munich
Schema and Open Graph 101 - SMX Munich
 
Crawling, indexation & the impact on performance | Brighton SEO
Crawling, indexation & the impact on performance | Brighton SEOCrawling, indexation & the impact on performance | Brighton SEO
Crawling, indexation & the impact on performance | Brighton SEO
 

Destacado

Random notes on big data
Random notes on big dataRandom notes on big data
Random notes on big dataJay Wu
 
Page rank
Page rankPage rank
Page rankCarlos
 
Web intelligence and big data
Web intelligence and big dataWeb intelligence and big data
Web intelligence and big dataKeyur Shah
 
CLOUD COMPUTING UNIT-5 NOTES
CLOUD COMPUTING UNIT-5 NOTESCLOUD COMPUTING UNIT-5 NOTES
CLOUD COMPUTING UNIT-5 NOTESTushar Dhoot
 
Web Intelligence - Tutorial1
Web Intelligence - Tutorial1Web Intelligence - Tutorial1
Web Intelligence - Tutorial1Obily W
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial IntelligenceZavain Dar
 
Cloud computing (IT-703) UNIT 1 & 2
Cloud computing (IT-703) UNIT 1 & 2Cloud computing (IT-703) UNIT 1 & 2
Cloud computing (IT-703) UNIT 1 & 2Jitendra s Rathore
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Destacado (12)

Page rank2
Page rank2Page rank2
Page rank2
 
Random notes on big data
Random notes on big dataRandom notes on big data
Random notes on big data
 
Page rank
Page rankPage rank
Page rank
 
Web intelligence and big data
Web intelligence and big dataWeb intelligence and big data
Web intelligence and big data
 
Relationship Between Big Data & AI
Relationship Between Big Data & AIRelationship Between Big Data & AI
Relationship Between Big Data & AI
 
CLOUD COMPUTING UNIT-5 NOTES
CLOUD COMPUTING UNIT-5 NOTESCLOUD COMPUTING UNIT-5 NOTES
CLOUD COMPUTING UNIT-5 NOTES
 
Web Intelligence - Tutorial1
Web Intelligence - Tutorial1Web Intelligence - Tutorial1
Web Intelligence - Tutorial1
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial Intelligence
 
Cloud computing (IT-703) UNIT 1 & 2
Cloud computing (IT-703) UNIT 1 & 2Cloud computing (IT-703) UNIT 1 & 2
Cloud computing (IT-703) UNIT 1 & 2
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Similar a PageRank & Searching (20)

PageRank Algorithm
PageRank AlgorithmPageRank Algorithm
PageRank Algorithm
 
Pr
PrPr
Pr
 
Page rank algortihm
Page rank algortihmPage rank algortihm
Page rank algortihm
 
I04015559
I04015559I04015559
I04015559
 
Page Rank Link Farm Detection
Page Rank Link Farm DetectionPage Rank Link Farm Detection
Page Rank Link Farm Detection
 
Pagerank
PagerankPagerank
Pagerank
 
Search engine page rank demystification
Search engine page rank demystificationSearch engine page rank demystification
Search engine page rank demystification
 
J046045558
J046045558J046045558
J046045558
 
Dm page rank
Dm page rankDm page rank
Dm page rank
 
PageRank
PageRankPageRank
PageRank
 
Google Page Ranking
Google Page RankingGoogle Page Ranking
Google Page Ranking
 
Google page rank
Google page rankGoogle page rank
Google page rank
 
page ranking algorithm
page ranking algorithmpage ranking algorithm
page ranking algorithm
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
 
PageRank Algorithm In data mining
PageRank Algorithm In data miningPageRank Algorithm In data mining
PageRank Algorithm In data mining
 
Pagerank
PagerankPagerank
Pagerank
 
Seo and page rank algorithm
Seo and page rank algorithmSeo and page rank algorithm
Seo and page rank algorithm
 
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCHLINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
 
Ranking Web Pages
Ranking Web PagesRanking Web Pages
Ranking Web Pages
 
Page Rank
Page RankPage Rank
Page Rank
 

Último

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 

Último (20)

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 

PageRank & Searching

  • 1. PageRank & Searching Rahul Bindra, Aditya Nigudkar B.E, Computer Science, 2nd Year Madhav Institute of Technology and Science, Gwalior rahulbindra@gmail.com adimits@gmail.com ABSTRACT Information on the web is increasing exponentially and so is the number of people seeking it. In order to waste minimum time in searching for queries, we use search engines almost for every bit of information on web. There have been many search engines in the past like AltaVista, Inktomi, Overture etc. but they failed to rank their results. In late 90’s Sergey Brin and Larry Page led GOOGLE revolutionized the search engine industry when it came up with its PageRank system to rank the search results which helped GOOGLE to change from a struggling 2-member search amateur to a multi-billion dollar search enterprise. The paper aims at giving an insight on the pros and cons of a PageRank system and how it changed the meaning of searching for billions of people across the globe. 1. Introduction PageRank was developed at Stanford University by Larry Page (hence the name Page-Rank) and Sergey Brin as part of a research project about a new kind of search engine. The project started in 1995 and led to a functional prototype, named Google, in 1998. Shortly after, Page and Brin founded Google Inc., the company behind the Google search engine. While just one of many factors which determine the ranking of Google search results, PageRank continues to provide the basis for all of Google's web search tools. PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. PageRank actually works on an intuitive system, which works
  • 2. as a model of a web surfer’s behaviour. It works on the probability that a web page will be randomly accessed by a web surfer. It also takes into consideration the pages that link to the page. Using Yahoo as an example, the justification of this is that if a page like Yahoo were to link directly to another page, it is very likely that the page is of high quality (Brin, S., Page, L., 2000). 2. PageRank 2.1 What is PageRank ? PageRank is a link analysis algorithm which assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given element E is also called the PageRank of E and denoted by PR(E). Google figures that when one page links to another page, it is effectively casting a vote for the other page. The more votes that are cast for a page, the more important the page must be. Also, the importance of the page that is casting the vote determines how important the vote itself is. Google calculates a page's importance from the votes cast for it. How important each vote is taken into account when a page's PageRank is calculated. PageRank is Google's method of measuring a page's "importance." When all other factors such as Title tag and keywords are taken into account, Google uses PageRank to adjust results so that sites that are deemed more "important" will move up in the results page of a user's search accordingly. A basic overview of how Google ranks pages in their search engine results pages (SERPS) follows: 1) Find all pages matching the keywords of the search. 2) Rank accordingly using "on the page factors" such as keywords. 3) Calculate in the inbound anchor text. 4) Adjust the results by PageRank scores. 2.2 How is PageRank Calculated ? Quoting from the original Google paper, PageRank is defined like this: We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one. PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web. but that’s not too helpful so let’s break it down into sections. PR(Tn) - Each page has a notion of its own self-importance. That’s “PR(T1)” for the first page in the web all the way up to “PR(Tn)” for the last page.
  • 3. C(Tn) - Each page spreads its vote out evenly amongst all of it’s outgoing links. The count, or number, of outgoing links for page 1 is “C(T1)”, “C(Tn)” for page n, and so on for all pages. PR(Tn)/C(Tn) - so if our page (page A) has a backlink from page “n” the share of the vote page A will get is “PR(Tn)/C(Tn)”. d(... - All these fractions of votes are added together but, to stop the other pages having too much influence, this total vote is “damped down” by multiplying it by 0.85 (the factor “d”). (1 - d) - The (1 – d) bit at the beginning is a bit of probability math magic so the “sum of all web pages' PageRanks will be one”: it adds in the bit lost by the d(.... It also means that if a page has no links to it (no backlinks) even then it will still get a small PR of 0.15 (i.e. 1 – 0.85). (Aside: the Google paper says “the sum of all pages” but they mean the “the normalised sum” – otherwise known as “the average” to you and me. We can think of it in a simpler way:- a page's PageRank = 0.15 + 0.85 * (a "share" of the PageRank of every page that links to it) "share" = the linking page's PageRank divided by the number of outbound links on the page. 2.3 Need Of Iterations The equation above clearly shows how a page's PageRank is arrived at. But what isn't immediately obvious is that it can't work if the calculation is done just once. Suppose we have 2 pages, A and B, which link to each other, and neither have any other links of any kind. This is what happens:- Step 1: Calculate page A's PageRank from the value of its inbound links Page A now has a new PageRank value. The calculation used the value of the inbound link from page B. But page B has an inbound link (from page A) and its new PageRank value hasn't been worked out yet, so page A's new PageRank value is based on inaccurate data and can't be accurate. Step 2: Calculate page B's PageRank from the value of its inbound links Page B now has a new PageRank value, but it can't be accurate because the calculation used the new PageRank value of the inbound link from page A, which is inaccurate. It's a Catch 22 situation. We can't work out A's PageRank until we know B's PageRank, and we can't work out B's PageRank until we know A's PageRank. The problem is overcome by repeating the calculations many times. Each time produces slightly more accurate values. In fact, total accuracy can never be achieved because the calculations are always based on inaccurate values. 40 to 50 iterations are sufficient to reach a point where any further iterations wouldn't produce enough of a change to the values to matter. This is precisely what Google does at each update, and it's the reason why the updates take so long. 2.4 Examples
  • 4. 1. Let's consider a 3 page site (pages A, B and C) with no links coming in from the outside. We will allocate each page an initial PageRank of 1, although it makes no difference whether we start each page with 1, 0 or 99. Apart from a few millionths of a PageRank point, after many iterations the end result is always the same. Starting with 1 requires fewer iterations for the PageRanks to converge to a suitable result than when starting with 0 or any other number. A B PR 1 PR 1 C PR 1 The site's maximum PageRank is the amount of PageRank in the site. In this case, we have 3 pages so the site's maximum is 3. At the moment, none of the pages link to any other pages and none link to them. If you make the calculation once for each page, you'll find that each of them ends up with a PageRank of 0.15. No matter how many iterations you run, each page's PageRank remains at 0.15. The total PageRank in the site = 0.45, whereas it could be 3. The site is seriously wasting most of its potential PageRank. 2. Now, adding some links to above example. Start again with PR1 all round. After 1iteration:- Page A = 1.425 Page B = 1 Page C = 0.575 By comparison to the 1 iteration figures in the previous example, page A has lost some PageRank, page B has gained some and page C stayed the same. Page C now shares its "vote" between A and B. Previously A received all of it. That's why page A has lost out and why page B has gained. and after 100 iterations:- Page A = 1.298245 Page B = 0.9999999 Page C = 0.7017543
  • 5. A B PR 1 PR 1 C PR 1 3. Another example involving multiple links is as follows: The calculated PR values are depicted in the following diagram: As you’d expect, the home page has the most PR – after all, it has the most incoming links! But what’s happened to the average? It’s only 0.378. Take a look at the “external site” pages – what’s happening
  • 6. to their PageRank? They’re not passing it on, they’re not voting for anyone, they’re wasting their PR. A possible remedy to this problem is suggested in the following figure: Thus it can be clearly inferred from the above example that the architectural interlinking of a site with other sites plays a very important role in determining its “Average PR” which helps it to improve its position in GOOGLE’s search results. 2.5 PageRank Feedback Another important aspect of PageRank is "PageRank Feedback." Pages linking to each other can create a feedback effect that can increase the PageRank of those pages. Let's say that page A currently links to nowhere. If we add a link from page A to page B, then page A is saying that page B is important. This means the measure of page B's votes is also increased. Page B is now saying that the pages it links to are more important than they otherwise would be. So the measure of those page's votes will be increased.... and so on (with the pages they link to through the link structure). The effect is diluted as it moves down through the links. If we could point our web browser at page B and through clicking on the links, get to page A, then so could the Google algorithm (at least one of the pages linking to page A has become more important). If a page linking to page A is more important, then so is its vote, and subsequently page A becomes more important! So by linking to page B, page A has made itself more important, thus creating PageRank Feedback. An interesting thing about PR Feedback is that you can use it to your advantage via the internal navigational structure of your site. It is very important to keep as much PageRank within your site as possible. Therefore, only link out to other sites from low PageRank pages of your site. 3. Optimization Of PageRank
  • 7. There are three fundamental areas to look at when trying to optimize the PageRank for your site: 1. The links you choose to have link to you, i.e., which ones you choose, and how much effort you put in to getting them. 2. Who you choose to link out to from your site, and from which page of your site you place their link. (Maximizing PageRank Feedback and minimizing PageRank leakage). 3. The internal navigational structure and linkage of your pages, in order to best distribute PageRank within your site. 3.1 Links To Your Site When looking for links to your site, from a purely PageRank point of view, one might think you should simply look for pages that have the highest Toolbar PageRank. However, this way of thinking is incorrect. As more and more people try to get links from only high PageRank sites, it becomes less and less of a winning proposition. The actual PageRank from an individual page is shared out amongst the links on that page. So, a link from a page with a PageRank of 4 might be better than a link from a page with a PageRank of 6 if there are less total links on the PR 4 page. 3.2 Links Out Of Your Site When considering links out of your site there is one golden rule: Generally, you will want to keep PageRank within your own site. This does not mean that you will lose PageRank from your site by linking out, but that the total PageRank within it may be lower than it could have been had you not linked out. The best outbound link PageRank scenario occurs when the outbound link comes from a page that has both: a) A low PageRank b) A lot of links to pages on your site. How can this best be achieved? One way would be by writing reviews of the sites we link out to on a separate page of our site, and by providing a link to those reviews along with each hyperlink to the external site. We will have to make sure that the review page also links back to a page in our own site that is high up its structure. (It’s best if this is our home page, but any important page will do.) By doing this, we’ve significantly reduced the amount of PageRank we’ve let out of our site. We’ve targeted the distribution of PageRank to the home page to ensure that less is passed back through our links page (which would be a wasted opportunity), and more is put elsewhere in our site. This concept can be illustrated diagrammatically as follows:
  • 8. If we perform the same calculations but include the review pages, here’s what we get: Without the Review Pages: The HomePage PR is: 0.9536152797 B,C,D PR is: 0.4201909959 Total is: 2.2141882674
  • 9. With the Review Pages: The HomePage PR is: 2.439718935 B,C,D PR is: 0.8412536982 Total is: 4.9634800296 Thus it could be clearly observed from the above results that adding review pager in an architectural linking helps to improve its PR by keeping the PR within the architecture. 3.3 Internal Structures And Linkages To get a high PageRank, it is not enough to have tens of thousands of pages. Those pages must also be in Google’s index. To achieve this they must contain enough content for Google’s algorithms to consider them worthy of being added to the index. As you develop content for your site, you are also creating more PageRank for your site. Its hard work, and you’re creating it slowly – but if you’re also creating pages that people will want to link to, then you’re doing yourself two favors at once: PageRank from both directions. Or, to put it in very basic terms: The best “internal” thing that can be done to build PageRank is to write lots of good content. Ensure that pages aren’t overly short or overly long and break the content into several pages where necessary. There are three different ways in which pages can be interlinked within a site: Hierarchical Total PageRank in this site is 4 Looping
  • 10. Total PageRank in this site is 4 Extensive Interlinking Total PageRank in this site is 4 The maximum benefit from PageRank is derived when this methodology is applied towards pages that want to rank high for highly competitive keyword phrases, or towards pages that must compete on a large number of keyword phrases. The Extensive Interlinking strategy retains the most PageRank within the site. Following this is the Hierarchical strategy and lastly the Looping strategy. 4. Significance Of PageRank Of course, important pages mean nothing to you if they don't match your query. So, Google combines PageRank with sophisticated text-matching techniques to find pages that are both important and relevant to your search. Google goes far beyond the number of times a term appears on a page and examines all aspects of the page's content (and the content of the pages linking to it) to determine if it's a good match for your query. Thus, in this way, PageRank helps GOOGLE to bring the best ranked search results corresponding to your query. A version of PageRank has recently been proposed as a replacement for the traditional ISI impact factor. Instead of merely counting citations of a journal, the "quality" of a citation is determined in a PageRank fashion. A Web crawler may use PageRank as one of a number of importance metrics it uses to determine which URL to visit next during a crawl of the web. One of the early working papers which was used in the creation of Google is “Efficient crawling through URL ordering”, which discusses the use of a number of different importance metrics to determine how deeply, and how much of a site Google will
  • 11. crawl. PageRank is presented as one of a number of these importance metrics, though there are others listed such as the number of inbound and outbound links for a URL, and the distance from the root directory on a site to the URL. 5. Conclusion To sum up, PageRank is a technique that has revolutionized the concept of searching for every web surfer across the globe putting the whole worldly knowledge just a click away for anyone. It has assisted in unification of cultures and sharing of ideas by providing an easier, faster and more relevant way to answer to the query of a user. This technique, along with GOOGLE’s huge database of information, has emerged to be a one-stop point for every user around the world with any type of query.
  • 12. 6. References 1. David A. Vise and Mark Malseed, “The Google Story”, Delacorte Press. 2. Nancy Blachman, “The Google Guide” 3. http://www.google.com/support/librariancenter/Article 12-2005-1 4. http://www.google.com/technology/pigeonrank.html 5. http://infolab.stanford.edu/~backrub/google.html 6. Sergey Brin and Lawrence Page, ”The Anatomy of a search engine”. 7. http://www.centaurhosting.com /Understanding Google's Page Rank System.htm 8. Chris Ridings and Mike Shishigin, “PageRank Uncovered”. 9. http://www.WebWorkshop.com/ Pagerank Explained_ Google's PageRank and how to make the most of it.htm 10. Google Support Center, “Google Technology”. 11. Wikipedia, “How PageRank Works”. 12. Ian Rogers, “The Google Pagerank Algorithm and How It Works”.