Más contenido relacionado
Similar a Done reread the effect of new links on google pagerank (20)
Done reread the effect of new links on google pagerank
- 1. INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE The Effect
of New Links on Google PageRank Konstantin Avrachenkov —
Nelly Litvak N° 5256 July 2004
apport de recherche
Thème COM
Unité de recherche INRIA Sophia Antipolis 2004, route des Lucioles, BP 93, 06902 Sophia Antipolis Cedex (France)
Téléphone : +33 4 92 38 77 77 — Télécopie : +33 4 92 38 77 65
¢¤¦¨©§ " %'¤0124562'@BD2E ¤IQ'TUVXYBbdXUTQhpUqsuw¤hyy ƒy„U s ‰‘„”–™5fhqkl„”Xop”–‘‘t'vpw z{}~'' bqq
b}‘Uvl qq lXw‘'ww‘‡UQpvv©f}éè¤vvv“”2 }vX ¤s„'T‘vv Q}p„v‘6tv‘™vdléé ‘l é'è‘ o{èl'{t–p'op{ ‘è‘–l0‘tww£hvv £
{v‘q¥¦' }vX'spvvX}‘£'v ¥èh{'{‘l'lX vdd lXpé'é'£vqhtl«{è‘£¦'évvwhh w}é ‘v¨k‘ll ' vé ”lhé«"l'é„o{wl‘® p‘‘t}{«~
–}¯¯¦'°évvps¦®klé ‘2{‘b'UX²¥vV‘' 6o{„T{„ ” è‘q p³£hvv 2Q}p„v‘I¦” qt{oé{{Eé}®oh{'é E
é}p2'v³ophl{v¯è{£Q}vX}éU®blè‘ {‘–pkh0q{}ltvé} qkt"‘{Tht q2l ”éè2'vé ‘«{èpé{éT£k‘T¨èw‘'‡ ‘q‘{èé°
„‘oé{ {{ ·©„ué}p°}é ¦è{{ ‘„èph p{™ ¦l'{66 }Q}p„v‘¸vl‘„© q¸‘v„º‘‘ll‘„l0plp¯¥ l‘T¼l T£l‘„l–oq k{£}³vq{è6}p
è‘h ‘¤klwT{'pvb¦0ov o é q{éT™°¦'©é}p „‘oé{ {v è‘q2èélt qEè{–©„½op02‘‘è~u}é upul‘¤}lé'2 }é u l{' 'Tvp–è ‘qU'évè
'El‘ ¦'·évvX"vé ¤l‘„è¦'¸'v02‘é«{èX' ¸hI{2 Tpv £hvv vspvvX}‘Idb}‘¦®é0bU éTlpz®‘l 0vw¯ ‘hèé¸hlwTl„vps¸}{vT
w } é ¯”swXÍX²YV²Y{ÍÓè”Õ wÙ'”ksÕ~”²sÍ”dTTw”~ßw}ÍX”}²y¯wvÍXQsÓY”~YßwX”spYow”Õk{YÕ² ²¯'kÕ~oÍ ”
d~ÍkwyÙ– d¤zUX””~VÍ”p²Íwo¯QQ°XÓÓ ²²¥w”Í”pTY~”Ù™pYowXÕl{YÕ²™Upp Q6”””~VUw™p²²b²Íwo ”” ²wo²²p¥
sdüý„²””ߦs'YÕ}²ÕÕl§²²w”Dd²„Íÿ”~° h”~Õ²Õkk°p ‘¤²X ²¥²²X TÍwXk² Ó§§wÕ© " XYwÙ”~”²§pY²””ßX Õ”yßX sX
okyÍ ü£wIhV~oͲhp~hzs””tt {ÍhÕv}w~” üü wIU! d}v‘ ¥"IX%„&w²²bp· ”YÙ”~”²pdX sÓÍXkyßw””k}Ó§§wÕ©‘ h
'ww({Yh XüV~oͲ ”
The Effect of New Links on Google PageRank Konstantin Avrachenkov* , Nelly Lítvakï
Thème COM Í Systèmes communicants Projet MAESTRO
Rapport de recherche n° 5256 Í July 2004 p 12 pages
Abstract: PageRank is one of the principle criteria according to which Google ranks Web pages. PageRank can be
interpreted as a frequency of visiting a Web page by a random surfer and thus it reflects the popularity of a Web page. We
study the effect of newly created links on Google PageRank. We discuss to what extend a page can control its PageRank.
Using the asymptotic analysis WG provide simple conditions that show if new links bring benefits to a Web page and its
neighbors in terms of PageRank or they do not. Furthermore, We show that there exists an optimal linking strategy. We
conclude that a Web page benefits from links inside its Web community and on the other hand irrelevant links penalize the
Web pages and their Web communities.
Key-words: Google, PageRank, Rank One Update, Optimal Linking Strategy, Markov chains
The authors gratefully acknowledge the financial support of the Netherlands Organization for Scientific Research (NWO)
under the grant VGP 61-520 and the French Organization EGIDE under Van Gogh grant no.05433UD.
* INRIA Sophia Antipolis, 2004, Route des Lucioles, B.P. 93, 06902, France, e-mail: K.Avrachenkov@sophia.inria.fr l
University of Twente, Dep. Appl. Math., Faculty of EEMCS, P.O. Box 217, 7500 AE Enschede, The Netherlands, e-mail:
N.Litvak@n1ath.utwente.nl
° ~ƒ©"£¤ ¥§¨ %o¤ '@ƒ2 –Eµ¤ 622o Q}p„v‘®X~Q" ‘” q„s‘{è o é}q'lèl„lXsl' v „{h‘'tQ£hpv wvw qp‘‘ „sévv„
¦'dQQvv„b}‘” „q$ol{ hl'{‘&o&®op0”® 'Xpé'é'® qXhtkèlX ( ‘‘£é}p¦'°év¥é ‘l ètl}l„‘})„T{v lpp'l'éo{£ qpé® –
p‘‘t}{«&£ 0 ‘é£ }v®©„¯2pé$o{é q vé"" 'U' q„ év‘p„}‘è 'ézl‘z Q}vX}é qb£hvv v péz qtl'qlpéU~é{p3 h‘„q
- 2. pèhw‘‘ }v"U'q 0q qèé'¥kp”Q}p„v‘Isz6‘l ètlvh5 } } hlvlh”‘l}{ h‘p}‘péQQvél‘t{kpéQèXwopé qèl vé l
”éèXh‘¯0vhl{'hbl¯ „‘p‘vX}q¤è 'éb}‘Uvll„pb ‘„”T}hw}vX$0‘‘évv™¦'¸obvq }vXbpvtk ‘X'zé „¯‘p飔phl{vé£h0
Qoqt~{–‘é0klwT&'pè–vq{è6} 0 ( '{v‘èt{k„”„h™ q 'éh‘¯élv‘l™}q¤élvv„{‰”pvh‰ qX‰è ' }·k„è· qq '‘'v02‘ }q&vqobh0
'º{'T}éwé ‘„‰ è„é é}‘élp‘l)„"6'évètl'hèX‰ql èt{Tl„‘w q·©„º'è„‘wov0–‘é}‘'„„ X 9 ° £hvpèphQ}p„v‘Ihb}‘¤®‘£I
‘T{v‘h{{}''pè®vq{è6vè™ ( o{v‘ {k„0'h" q 'é„ ¥éAü‘X q2¸}{vT
yX Y9 é'{h
y ºopè锦vX²{v£}¥p‘„„2‰‘0{T¢vX²{v¦ {Tltk
}‘I&Ùs0l‘kk „„pè TT‰2hhU'{è ‘6èlº‘lpé}éè «~n£}é ~‘0é"l {vé qv évv èl°élpé}‘ «~|z|}}lé'v}s„}6Ub p{'{‘{olX
6v"k{T{èpé}{2élpé}‘ «~ {éT{‘l‘kk „‰ }é}p¦w Ó¦vw q'®lºv„'' é©èluopé~w}h®0q qè '}l vé£vwl‘”¦'u~{léol‘{vd£hpv 2‘I
‘T{„ è{®Q}vX}é¤T®èXvkbvé'' „b0vh{¯''vw qèé–{°‘‘éèto ¸””vèt}éèè qq v{6T{èp¸£hpv kl zékXXk 0‘
”UT"'™«{'wT{èpé®l¸op”éql”lé°pvvX}‘I0h'p'w}Q‘{vUp{}t3 }shsqh XqUv}UXhU2GzYk„®v l2}6'hl' k v£k‘{v„2év „
¥h6dv‘vh è bvé E·„v'© G(é”vblXo'h{è U'„¸‘q®®v{"v{ ElEp''' 'wTl™{‘2Q}p„v‘¤'v0‘qwT{èp¯‰‘ blXkX}wwº ‘è{„ol
vv 'v¸
{'h}w q„ ¦v–– lhkl„¦ pèh2vvh '' ©® 'vhlw}{vp‘„l°"6w}v¤¸él'2Uv h2}}h ' } {l”{”vék"'l‘££ pè Tèé2h‘„kl
vé2Q¨‘'º q”‘'ƒ ‘qU'‘'‘‰l‘£¦'ºé}pp{v ‘tw {‘'¦'6véT{V ¨‘' q·{‘°évvX™™ lp {‘E{}06©„ op”–‘‘è~³U'‘'‘22 {vÿl‘Eè ‘
'lXT{èp„|~"l‘„l™pql 6}¯ èéh ‘6klwTl„vr ‰‘¥é}U'd sv{pv‘è„„ £v¯¯v èT2¨Ó–h„ol v2‘„"klé qq h‘X~{èp™ll é}s'h{'é
é}p „}'vhl{v}è{pQ}p„v‘Id‰‘'2èè {‘„él‘èé®h„ol v3¥hévhlèè ®{‘él„è 0èévl£}évèql vdh„ol
vE™èl6{‘‘„èE}dh‘„l6}rÓ·pl{tkp¤Í¦hq q‘‘{{ ‘I ‘T{è‘‘ v{2‘tt }é 6 ‘'{èpl‘ 'q‘lXll vEvpé'¨pvvX}‘¤Tül„lév ‘ qèl
v·vp‘„ ‘q'—ÓÓ hX²{èpº6 k ‘°vlh0qlvlt vé} qkt¯""vqw} éT{‘w}hov q«{èpédlé}Ql‘T%èUé' o{„T{„ ‘‘ sU'‘'
ot}hvp‘v„‚Ó q„²{èp·–"‰ q'0v ~{{}l£lé}l‘„l™'htk{}ºv‘l 0v¯ è‘h ‘6~{{}l„vvzp é} èpq¥™‘{Th ‘ 'vé'è k vé" S h„ol v·‘
sDGé²²²
¢¤¦¨©§ "$&(0234$629@B§DF©G0
2p Epo– q‘k0 „{2vu{‘RÓhl„léo {„h‘„p{è¦él¤l„v{wu'év ‘„l¸é %é}p„{T{ k} hèé¸l‘„è0h‘'{v bT¥„v„„pl‘„l™}
{®~hé „} è6h‘é ‘lX ‘"v"l‘pé{}é ‘"vdl„è„T}h" }vX‰””vèt}éè®v°{‘™©„¯ ‰hé„w kl ‘³l‘„ è ³élp „–p{ q„2t0©'lé' v"vé
év¤Íl{èht}{vlI ‰‘ºv{èpèévt q„ v‰£hvpè–‘{„l'hlX ³ V9X°h`"l ©'}b 4w ®{º t~k évv„™p'op{ ‘è‘El¤l‘„èQvv„b}‘ é
w·{oéX²{ p‘‘t}{«~¤}w0é}pv"‰‘2Q}p„v‘°tb q'é‘„ · ºl‘‘ v èT ‘6‰”ve®'é}l hg l‘El}w}¥hé2U'2vé}p„„pul‘¤©„½}é
% ‘oé‘°l‘hpq¨hhU'{è ‘¦6Tl{èg5v pè T„ h‘‘Upl·l T°évvt2 vvx€upqlpv ‘uè ‘q„ ‰‘„‚„„ ˆí9 è%t6v‘¸v {‘³pqlpv ‘
% è‘q¤}é ˆ„„ p”%}{‘'{ lvxÙ2 é}p³ qh„E‘}E ”vv pqlpv ‘%è ‘q„l‘ élpé}‘ «~°tt l‘{„v º}0p‘6} ¯é}p„vs{‘‘ ¦'¯ é}0'
vD •–Gíz—Ó·p{ q„‰l66}p™l‘ hhU'{è ‘ºvw}‘‘ 'v‘‘X²{„ VU«£ £v{l‘0„ ºl T£°{vé qv l‘kk „bphXX èl³lv0élpé}‘ «~
{°}¸}{‘èlw}{6©„¸é}p™«{·l‘é‘«« plD qt~{l ‘q{èp¯¥‰hé„qlé2Q}vX}é° b qoéé„ ¸v ºk{T{èpé}{· q kl{ ‘ql
v©}‰º¸vlpT¤wévè©‘pl0~wTl0lév'2tb{‘6k'£v¥} Q©„© }vX' vé El‘™{{vékèl vº6T{lèEt xdfdh§‚íFFíG&ln ~V é'{q
tb06}l{«º‘pll } d'h{l „vll Xp }V{6v‘pru ‰{‘h‘– „„}z¦'¸évv„„‘} `xtr24bt2l‘¤‘lpé}éè «~©}
‘vb~‘0‘ ‘³l©©{vé qv
évv`º è0 –w‘pl'%h%£hpv 6l Uvr XX²°‰‘E£hpv –6T{lè § 2~{hw vklt}dv „l q q vsvé ¦è{lX qé'è‘ vskºlé'{6oq k{ é‘
h‘v {Tƒp„²{vbukéw¤l T ydz{z
The Eßfect of New Links on Google PageRank 3
1 Introduction
Surfers on the Internet frequently use search engines to find pages satisfying their query. However, there are typically
hundreds or thousands of relevant pages available on the Web. Thus, listing them in a proper order is a crucial and non-
trivial task. The original idea of Google presented in 1998 by Brin et al [3] is to list pages according to their PageRank
which reflects popularity of a page. The PageRank is defined in the following Way. Denote by n the total number of pages
on the Web and define the n unrecognized text n hyperlink matrix P as follows. Suppose that page 2' has k > O outgoing
links. Then pû : 1/ls if j is one of the outgoing links and PU unrecognized text O otherwise. If a page does not have outgoing
links, the probability is spread among all pages of the Web, namely, pi, : 1 In order to make the hyperlink graph connected,
it is assumed that a random surfer goes with some probability to an arbitrary Web page with the uniform distribution. Thus,
the PageRank is defined as a stationary distribution of a Markov chain Whose state space is the set of all Web pages, and
the transition matrix is unrecognized text (1) Where E is a matrix Whose all entries are equal to one, n is the number of Web
pages, and c 6 (0,1) is the probability of~not jumping to a random page (it is chosen by Google to be 0.85). The Google
matrix P is stochastic, aperiodic, and irreducible, so there exists a unique row vector vr such that N 7rP:7r, frlrl, (2)
- 3. Where L is a column vector of ones. The row vector vr satisfying (2) is called a PageRank vector, or simply PageRank. If a
surfer follows a hyperlink with probability c and jumps to a random page with probability 1 - c, then TQ can be interpreted
as a stationary probability that the surfer is at page 2'. In order to keep up with constant modifications of the Web structure,
Google updates its PageRank at least once per month. According to publicly available information Google still uses simple
power iterations to compute the PageRank. Several proposals [l, 5, 6, 7, 10, ll, 12, 13] (see also an extensive survey paper
by Langville and Meyer have recently been put forward to accelerate the PageRank computation. This research direction
can be regarded as a system point of view. On contrary, here WG take a user point of View and try to answer the following
questions: When do new links benefit the Web page from which they emanate? When do the pages from the same Web
community benefit from the link creation? Is there optimal linking strategy? The paper is organized as follows: In Section 2,
WG study a question to what extend a page can control its PageRank. Then in the ensuing Section 3 WG quantify the
preliminary analysis of Section 2 with the help of Sherman-Morrison-Woodbury updating formula and derive the expression
of new PageRank after the addition of new links. In Section 4 using asymptotic analysis we obtain natural conditions that
show if a newly created link is beneficial or not. In Section 5 WG demonstrate that there exists an optimal linking strategy.
Finally, WG provide conclusions in Section 6.
f xX2"hz é'{ gi
q'‘vlXVlé é qt'T{vVVéé²{èp¯Ùp d„pk®lbl'wlé}dléT} ‘! • ¯l‘"ov q«{èpé} 'q X²{}l v¦}p v v'³lé}} l‘6 ‘èlt}wk{}l6t
{6do • 0{‘6‘{vév‘è è~·l·lXvw©l‘ k{}l° 'i v{™vélv{ql vº«pléèé«{ vd~wT{¤ bw®k ‘0lé~{lp‘°¸}{vT6‘{vU'l~vq"å 'v
éTB„k{v‘ lE{‘™™ v Tè‘0 qXov0Upl«{èp¤{„l‘è„ tw Aü¨ wy ¢¤ 6DF©Grvv ¤2X&–92hf 4‚G"„i }}
§‚
Fd &xX2"hz Y9 ¯t ÙÙpè T¥¥ lp tXlé} §‚
u EE § ”p º "2 "22p2½²s ƒD2¤ ‰é‰Qvv„b}‘ q'é‘X – Y9Q'èX}{è ‘'U'é ‘wv” vl”èé'v0 ‘‘ }é –vq{vpè‘£ è‘qzvU£évvv
‰hé„T{‘„pk „kz"”™è pw™é}p{wé}‘p¥è{w{v‘hè‘™tplé 0q qè lévq{vv ‘£è ‘q„‚"' T "°lé} l‘T {éT2l‘EQvv„b}‘³„}uU
°lèk{' p·‘{h ‘é²2v‰l‘{'0{'{0‘'{6v‘ p‘6{'{¦ ‘'U'é ‘”p vq{vp葸 ‘q'gÙ– è‰} T {³„kl 6T{°l©éT”ohl„é ³é}p
„}ºovh{lpéè{Q}p„v‘Ips2{‘t¥„é Vq"b {k"{„„} é{‘££ v èT ‘–ék'' ‘Ioq‘{„{k v°° v"l‘ Qvv„b}‘tÕqré¯2í `
§g
! 3‚"$&¤í X do)V2 q„‘}{}g l0w' '0„pvs{‘{ 6Tl{è1 2 3‚"$ 3 2b}0„èpq¥™é”p 2• ©5 3‚"$ 3 9 l xX @ hz é D
é'{¦©©t¥l‘qDÍlºov ‘0Ev¯lé£t q„p{«~66}l{«7U2TT q qo ‘£” qtl'l'lY{è0™}élv{‘ ‘ }{vT–w } 9A
¢Dzd0922F®èlEl‘™k{T{®lév'GV0—2 FHph‘'{{{vél«{èpéwUo~"'„ {‘‘ ~wT{„¦92h³vl£ov qé²{„ ¤h0l‘™6T{lèR"0q}é
Elé™k{}l 2t‰}élv{‘ ‘éQ¯'P {‘0 h‘2U'vphtkè{"{6~wTl yX2"h¦Uo2 v{0 }élv{q{èp¯pév{6} èp B R U3X`bd
V@$ze
&” D—
! 3‚"$3i
§‚
F yX é‘ll‘„„¯"0é”vh j ¢ y06 ¦k F dvé ©ék ‘ºlé0kl{vé·¸vlpT·‘lp „k~pV¥ „}ºlèl££ p}hh 3¢ m ˆ0 3¢ D3H n $0 ” 7$r ¤
7$9b 4o F ¤ 7 F p¯ü
¢¤¦¨F9" r4© ¢&(Gí
2 To what extend a, page can control its PageRank?
The PageRank defined in (2) clearly depends on both incoming and outgoing links of a page. Thus, the easiest Way for a
page to change its ranking is to modify the outgoing links. Below We shall show that the PageRank can be Written as a
- 4. product of three terms Where only one term depends on outgoing links. It will allow us to estimate to what extend a page
can control its PageRank. To this end, WG first recall the following useful expression for the PageRank [2, 9, 13]:
Where ek is the lc-th column of the identity matrix I . Now, define a discrete-time absorbing Markov chain unrecognized
text t : O, 1, . . with the state space unrecognized text 1 . _ _ , unrecognized text Where transitions between the states
1, . . . ,n are conducted by the matrix CP, and the state O is absorbing. Let unrecognized text be the number of visits to state
j : 1, . . . , n before absorption. Formally,
Where unrecognized text denotes the indicator function. It is easy to see that the value zij is the conditional expectation of
Nj given that the initial state is 2'. Let qm' be the probability to reach the state j before absorption if the initial state is 2'.
Using the strong Markov property, WG can now establish the following decomposition result.
Proposition 1 The PageRank of page 2' : 1, . . .,n is given by
Proof: It follows from (3) that
INRIA
l4 èl
| vé é'{n` f 0 t®l‘0 {k£{T‡}"6Tl{è 0 Ó èXíQl‘°vqlévwbél0‘I ‘T{è‘ pl–‘t}–l·v''' 'wT{2{‘6pvvXX 'v0‘qwT{èp¯
}‘‘
"" {„kl{ ol ‘ºv‘wk„èp„blºlé0„vl–v w}é”p‘£‘U éTlpq¥£}{£v‘ l0U'll v{ 20v{®'v0‘{'‘„élèp®} } hlt'Q‰‘£‘oh‰l‘„v{'
élTht q„"‘I ‘T{èé00 v{2‘t}®®v‰{‘Q}vXX
}é6' '0'h{„ sDGé²²²
¢¤¦¨©§ "$&(0234$629@B§DF©G0
¥pél„h‘'h{èphh v}hn l yX2F° ¤£ˆ ‘"™é”v F ¥z& 7$0‚5 F & ¢d‚5 F q‘éklèlq{è‘0{‘‘ tvk‰XhéT{èp¤ yX""™è00„ ‘ }l'
Epq{vè‚YX² ‰‘‘ qXop”Uplèl v¤¤ v{2‘thÍ9¥{'‘{„l'h{lépvvXX
}‘0vsévv wv–‘{h ‘é²}sl‘{' –‘èl ‘è 'w0‘„lºv‘
ulé·l'{ ‘'U'é ‘°v {‘¸v‘lvpèé³ è‘q0v®évv|{f„éoph w }‘pè‘”{‘v‘lvpèé6è ‘q'U0é}p'v·'vhl{vd«wbQ}p„v‘°‘·{E02‘èl ‘
yp²lp) @ G F6„ 6 92h6ˆ¨9 ¤ ÙV‘'{ @ r¨ Q ™E‘lpé}éè «~¤lº{olél vw¤l|‰k{vk{èé æ {v {§b}{–{éTå {‘6‘‘U'2 p‘é h¦p¨4 ¤
y}é‘l”q 6Tl„è`é ¤¤ pq XXbtt év{ ©v wT{‘'é} hll ‘è™{°vw‘ 'p®U„„}éll ¤{‘t'pkk ”é}p Uv pw‰l0évvX"lé}b}{£ ‘hè‘ p‘
6l whéw·0 pètoE0vvXb}é ¤è{‘„èph p{"tkp }l„ Ep {v {‘« lX~vs{‘é ©„¯ Ùévv`6 qhX°‘v¤é”p¸vq{vv ‘ ‘q'‰{‘' 2
t¤v'{ 'èhkp l {‘³ T"'°Uv‘ ‡v l éo–l‘„l”t£vè0h~£‘ºw }é'lº{olél³¯
E'èwo‘è"vé} qktz{–klé ‘” ¤ ‘o{vè léb qé‘„éo£}¯évvXwèl‘pq¥pqlpv ‘ è‘q yèX”vX'h ‘}év è‘”‘q
qXvEl‘‘ Q}p„v‘0}s–©„º'v02é‘«~pz‰‘‘ }q{‘vwv Õí¯'v”®l {‘°{}0”'vé'è k v©lé}2 ‘}év è‘·'vél„„·opékt q„{v‘è–èhl™
u{v‘hèéé0‰‘”q v{2‘t‚Y9 é' é"l°hévp{«b °l‘t p{' ¦6‘}{0lé}'p'uE{‘{'o
³vv l v‘è 'vh® ”élTp'0'h„Vlè o6£hpv 20„pkélX{‘6pvvX}‘·p©° vh}{«{‘0 –l„} h XíÍ Ó¤l飄él‘èé”l„ol vé""£l‘Tƒ‘T
2¦'ºé}p£l‘p‘ ¤él®è{{'v{'b{„lv‘woXw{– é'lXvl è{bpvvX}‘I –E§E ¤¤ s@2B562'ÿƒ2 –E doQépov kt q'Q¦'–é}pèl vt hh
„l èéhp} ‘„è'lXT{„ ™hh „l èéh„d¨èlévq p{Q}Iv„‘'w} è~vT"‰pll‘0"lé}wléé}p"èl0‘„ ‘qzévQ é q'¤} 2{‘évv„p{T"v{ é é
w·‘„B è‘qvl™Uv hl„ ºé”p™èé ‘ '„‰‰ {v –{v pw‰‘'¯‘l‘2p ‘ qèl vº}Q‘'¼ è‘q „}º {'h}w q„ ¤v‰p‘™{v‘6‘I ‘T{{ }pl‘‘ hh
„l ‘°6T{lè x fk
The Eßfect of New Links on Google PageRank 5
Substituting the last equation in (6) WG immediately obtain El The decomposition formula (5) represents the PageRank of
page 2' as a product of three multipliers Where only the term zn depends on the outgoing links of page 2'. Hence, by
changing the outgoing links, a page can control its PageRank up to a multiple factor ZM : 1/(1 - qü) 6 [l, (1 - unrecognized
text wher@ qu G (0, C2] is a probability to return back to 2' starting from 2'. Note that the upper bound (1 - unrecognized
text (approximately 3.6 for c : .85) is hard or rather not possible to achieve because in this case a page 2 points to pages that
are linking only to 2. Such a policy makes 2 and its neighbors isolated from the rest of the Web. If page 2 does not have
outgoing links, then ZM is very close to the lower bound 1, since there is almost no chance to return from 2 back to 2 before
absorption. Bianchini et al. [2] used a circuit analysis to study in detail the influence of pages Without outgoing links
(leaves, dangling nodes) on the PageRank of a Web community. The authors of [2] came to the same conclusion that
dangling causes a considerable loss in ranking. The formula (5) helps to quantify this loss. We note that even a threefold
increase of the PageRank might not be considered as a significant improvement, since Google measures the PageRank on a
- 5. logarithmic scale [15]. In the ensuing sections WG show how a Web page should use its scarce resources to increase its
PageRank.
3 Rank One update of Google PageRank
Let us consider a Web page with lc@ old hyperlinks and kl newly created hyperlinks. Without loss of generality, we assume
that the page with new links has index 1 and the pages towards which new links are pointed have indices from 2 to kl
unrecognized text 1. Then, the addition of new links can be regarded as one rank update of the hyperlink matrix
and Where ls : lc@ unrecognized text kl, pf is the first row of matrix P. In [10] the authors use updating formulae to
accelerate the PageRank computation. By restricting ourselves to the case of rank one update, WG are able to perform a
more comprehensive analysis. The next theorem provides updating formulae for the PageRank elements.
0r @ hz kvV ¦n 'Tvèé}l™‘'h g"$ 3 ®e v yX @ hz p qg$&¤ d)
X r d¤ ˆ
§
k„X r d¤ ˆ
§
'
§‚ 3‚"$ 3 vé ºov kXpé'hl v d
P ‚j ) qèé' – UV¥”é”p3 „ V‘„l¢ tb{‘º"Ó~£lT‡v6Tl{è & V} é'é' h ) o § " vw‰hé„q"t vo g$ ¤
%}é ºélèé tXoq¥Vpo ` v$
e
e"
'¤22"hz X ¯t b‘‘ hè‘0{‘2h‘„l6v¤Ó¸v{ltkp¤Ù©hq q‘‘{”‘U éTl ‘0h pl–‘ | G¯{ q ¦ 3 ‘¥ „}ºlèl g ¦ 3 2 g$ ¤ ˆ
§g 3‚"$ 3 ‰é'¯‘él„2‘èl ‘ h ‘”lé}UTv£„héT{èp¤h )
& 2• 6
3‚"$ 3
|j
§‚ q‚"$ 3
)
§g q‚"$ 3
q‚"$ 3
- 6. VD
¢ h "z ¨" •(023¤ 4}4t( ©f¥ ¤2X§1F ©9—h ¤"et " µn"„ 3hB} 6DF©Gr¢G ©3©©9‚í„ nt e"G(@4z(D 94t( " ¢¤@X
$Vb
3‚"$ 3 g$ ¤
V‚ 6 § 4 q‘éklèlq{è‘·{‘°}UTv–oq‘{„{k v³ vq n$"$ 3 9l` 92@ hz¯ hl ~XX£}é ~pVo¯¥ pq{vè‚ D‰}é g D²
i
‰‘6{„l‘è{™ %‰‘'pl„ 6}{” uè ‘6èl¦¦ v{2é ‚YX²nÙ‰é}p·6‘U éTlX™«wwv‘lvpèé ‘qq{‘'¦è³l‘0 qXov0Upl«{èp¸¸v{2é
°‚ v¦ v‘ ºl‘0l„opé q –‘èl ‘è '™ èQU”}UX²{„ V q p¯ü
¢¤¦¨F9" r4© ¢&(Gí
Theorem 1 Let kl new links emanating from page 1 be added. Then, the elements of new PageRank 'uector are given by the
following updating foimulae
Proof: Applying the Sherman-Morrison-Woodbury updating formula [4] to [I - unrecognized text We
Substituting the above expression for unrecognized text - unrecognized text : 1, ...,11, into (10) and (ll), We obtain (8) and E
The results in Theorem 1 are in line with formula If page 1 updates its outgoing links then in the decomposition formula for
Trl only the second multiplier will be affected.
INRIA
V• B 9 k2X Ó q'X V‘èp{‘t ‘„hévèè~°‘p ‘l‘„·l‘ ‘E‘I ‘}lh°évvnn „p ‘"l6‘ vé'‘{v }‘ èèl „ v"{„pw‘ ‘·
é}pB¸ lp@vh©é}p”lé}2év2·éT{©{R htºé}ptvE‰hé„s ¦. v{2‘t YX² {‘°{‘ { %–‘èl ‘è '” è"è o{„vl°} ul‘ºkXopé
%2é«{è‘ '2 ¥„«{‘'”è o{„vl°p2{'6} {‘ž lv0™ ‘'U'é q ‘0vº‘'l‘„‰l‘„l™t–hhU'{è ‘°é}l¤ lp 0épw0lq6 v{k ‘–l‘{v‘p }vBp b}
{lé}èVl'p'w}‘ ‘qw}{v ‘ ‘„ 0h– }vB‰{‘'6è¥0 vhév‘U'0l T¥lv0évv„ {„'' v£‘' è ‘q} E hpl£
º{v‘hè‘0T‰{‘lv0®{è0phéw·kèlé}l vºq'oé{‰‘„º”‘„ é'v0èé– èé° 'lXT{„ El 2‘è{v‘'pék 6èl¸k„v„{v vl‘„" ‘q'q‘twº pèh‰l°
kè{l„è„T}h£ }vX'Q‰‘pkh0qlvlt™}évèql " ¤{‘‘ ‘'hbl„ol v·} èT"él”” ‘ll‘„„
ot}{«« 6{‘t {l‘v §© Es™p' 2¤26s' dob
}‘éè¤l‘–vlh0q{}{ vé} qkt"l66 pl–‘t}n D}é tX"‘„`™ ®k6'è„hl ¤o pl™l p‘}é 2léb¸}{vTwévè” é qé'„
”hh l‘hhU'{è ‘6T{lè Q {lX qéo ‘ vQ‰‘vlh0qlvlt sDGé²²²
2 b b'é'v l‘–é}p° éo{„pkXè{®{v‘hè‘6é'' è®{oo „{{l°évvXl T£vl–wé}wvol'{ '„ ¤hº éèp¦Tvè‘0vp2 E‰‘Xk6–é~U0lé6
é}p„£l T{oo 'l·é}p 6vT „p~~ U' v‘·{·l‘ {}0¦'E'v02é‘«~p‚b'{{h”©„Eop”–‘‘è~–"0„v°k'}¯¦'6 }vXQl T"2k‘ll '
„}ºlXvwET lp1v‘™{6}‘vl‘„è·0l„ }l v„è°l6} Vhé2U'}zkl' ' do™ébéT¢ov kt q'b{ v{2‘t tXo£p {k„I"–l'{éT£lé”
qèI'{'é'2Uo~"''{ l‘”vt „ } {‘™‘' {v‘hèé–vsévv60t‰‘lp pk{èpé} l b}l‘w} vqhh „l èé6l' '{'é'„¥& lp1évv„ èl½‘èp w}‘h
‘‘ é”pE¸v{„}l„è0ép²”v vl‘„2évv„„³é‘k{‘'{0v{vplé¤Q}p„v‘©v }v§6 é'lXvl„"«
fr kX9 é wº „h‘ ”vè„h"{
{{olél”l2é}p" b~w}ll ‘™„ {v l‘twévvpvtwv v„ h b
§ b "bpo¥l‘£‘I ‘T{è‘–– v{2é vtXo2}{{éTv„ovw q ‘‘ {h Dz{‘£{v‘hè‘2}dévv£ éo{„pkX é' e
3
Ó0{‘‘„¨kèlé}l v¯plé‘{vév‘è è~ b
q‘éklèlq{è‘0{‘toq‘{„{k v¤è b
b
- 7. ¢¤¦¨©§ "$&(0234$629@B§DF©G0
D
VD
§b
The Eßfect of New Links on Google PageRank 7
In the new situation, the probability §11 tO return to page 1 starting from this page, is given
unrecognized text q11' 1 Z qfl > 1:2 Hence, the page 1 increases its ranking when it refers to pages that are characterized
by a high value of qm. These must be the pages that refer to page 1 or at least belong to the same Web community. Here by
a Web community WG mean a set of Web pages that a surfer can reach from one to another in a relatively small number of
steps. Let us now consider formula First, WG see that the difference between the old and the new ranking of page j is
proportional to vh. Naturally, hyperlink references from pages with high ranking have a greater impact on other pages.
Furthermore, the PageRank of
(13) 1JÕ 'V E zij > Z E F2 Indeed, if this inequality holds then the link update by page 1 leads to higher probabilities of
reaching page j from any page that has a path to j via page 1. Thus, in formula (5), the third multiplier will increase and the
second multiplier will either increase or remain the same depending on Whether there is a hyperlink path from j back to j
passing through page 1. Note that if several links are added by page 1 then it might happen that some pages receive new
links and loose in ranking at the same time. Such situation occurs when a new incoming link is created simultaneously with
several other links, which point to “irrelevantÓ pages. The asymptotic analysis in the next section allows us to further clarify
this issue.
4 Asymptotic analysis
Let us apply the asymptotic analysis to formulae (8) and (9) when c is sufliciently close to one and the Markov chain
induced by the hyperlink matrix P is irreducible. The asymptotic
tG p} ¢™k °'è„p{è¤o pl®{0v‘p‘v‰„h‘ T} 'hl v‘‘ vv} B°l °o 'hl ¤'èhk£l0„'{é e
hB9 l é'{ x
v‘‘{ppwE} èT¥é"{0 ‘'{èp£lè0‘ ™éT{‘w}Vopé qèl vé"{éTl‘TƒèQ–©„·é}p®èlº‘' 'lXT{„ ¤è ‘q}é ¤è{‘„èph
p{"U'‘'‘‘{v {‘„l™‘'¼è ‘q„ ¢ h ° " `£ "„t p" íq G¨q4 $@"$rD ¤íh ¨" (rV` 9¤2X" ¥¤2@ @ gH ne ¤G¨t ¥ © 4©©" 4( ( ¥0
G © ¤ fri •(0 nG §bx ‚ F¥0&"@A ¢¤" B¤©G ¤—"G( Gz(D 4 4(t 4r© —q b
G ©¢t iF 4t 4h }‚•(rV D ¤2@ h (}dF í" ¤¦6DF©GrRh ¤2Xb fF¥ t q"X¥¤2@ @ d ! q # %
z ¤qdF Gl G ¦¨" •(rV$X ¤2@ $ n(}dF í" ¤¦6DF©GrRh ¤2X § q B
&
2 lí D
£
tí&xe
5 F0H xe
- 8. ©t‰{‘'{vq qt£‘{}~„ol v·}é A t‰l‘– q'htTl vº6T{lè¤vs{‘–¸vlpT0w } é q o„ 0h 0Qh é'¥'vélt q'£ ‘«{bk{T{®¸vlpTwé}
¯}{‘bv Tp‰l'{ „Qépww‘p¤Í„'{ wv ‘èé"vQ'vhv„lp'é'v dos ¯‘T¦‘{TvQlézéw~sk{}l„”„hd}hléwlé'v{'·Ùd
d'‘p‘v™{bl‘T³{éTz¥vé qèl v6 pé}w}h{'„l T e
— ¤ dF Gl GgB¨"f•(02 X r @ v(}dF í" ¤¦6DF©GrRh ¤2X' r" (f d§ &¤i) R Bt 41X m¤G2X l fF¥ rD6 $¤2Xbq §6x” ¯t p
{k„s¥06}p”·wév‘v”}‰T}{ v‘è G F§3„ls«{4q º ³{‘00 v{2é EE v qtGv¥¥ v èT2 qtG‚2 3‚"$3§ 3
e5
6b
tG b
$3§xh 50 nq t 0&&¤í ‰é'¯Q"°él6l‘¤lXkpèp'h2dv‘{'h2k„l „oqé} k vuž p&l‘·¸}{vT©wé} %v„‘'wT{vvyl' p è 'é鉑'pl„
‘Õq G qtG‚2 q‚"$ ¤ xFe6„f n3 t „U 3 –he5„f
8
@„ 5 F0Hd he50
@d
@b
g @ I„ Hp¯ü
¢¤¦¨F9" r4© ¢&(Gí
approach allows us to derive simple natural conditions that show if a Web page with newly created links and its neighbors
benefit from these new links.
Theorem 2 Let c be sujjîciently close to one and let page 1 has kl new links to pages unrecognized text ..., kl unrecognized
text Assume that the Markov chain induced by the hyperlink matmï P ls lnreduclble. Then, we have the followlng
condltlons:
the PageRank of page 1,'
whepe mij is the mean fipst passage time fpom page 2' to page j ifc : 1.
Proof: First, WG make a change of variable c : 1/(1 + p), with ,0 > O in the formula for Z (C) as follows:
Then, we use the resolvent Laurent series expansion for the Markov chain generator (see e.g., [14, Theorem 8.2.3D
Where H : yr is the ergodic projection and H is the deviation matrix of the Markov chain induced by P. Since WG consider
a finite state Markov chain, the above series has a non-zero radius of convergence. Let us now prove the first statement
ofthe theorem. It is enough to show that Condition 1
guarantees that unrecognized text +1
unrecognized text for all c sufñciently close to one, or equivalently, for all ,0 sufñciently close to zero.