SlideShare a Scribd company logo
1 of 38
Download to read offline
Link Analysis

  Rajendra Akerkar
Vestlandsforsking, Norway
Objectives
       To review common approaches to l k analysis
                                h      link l

       To calculate the popularity of a site based on link
        analysis

       To model human judgments indirectly




2            R. Akerkar
Outline
        1.
        1    Early Approaches to Link Analysis
        2.   Hubs and Authorities: HITS
        3.   Page Rank
        4.   Stability
        5.   Probabilistic Link Analysis
        6.   Limitation of Link Analysis




3            R. Akerkar
Early Approaches
    Basic Assumptions
       • Hyperlinks contain information about the human
         judgment of a site
       • The more incoming links to a site the more it is
                                      site,
         judged important
    Bray 1996
       y
       • The visibility of a site is measured by the number
         of other sites pointing to it
       • The luminosity of a site is measured by the number
         of other sites to which it points
        Limitation: failure to capture the relative
         importance of different parents (children) sites
4            R. Akerkar
Early Approaches
    Mark 1988
    • To calculate the score S of a document at vertex v
                               1
            S(v) = s(v) +             Σ           S(w)
                            | ch[v] | w Є |ch(v)|
                 v: a vertex i the h
                             in h hypertext graph G = (V E)
                                                  h      (V,
                 S(v): the global score
                 s(v): the score if the document is isolated
                 ch(v): children of the document at vertex v
    • Limitation:
       - Require G to be a directed acyclic g p (
             q                            y   graph (DAG)
                                                        )
       - If v has a single link to w, S(v) > S(w)
       - If v has a long path to w and s(v) < s(w),
            then S(v) > S (w)
        unreasonable
5              R. Akerkar
Early Approaches
    Marchiori (1997)
        • Hyper information should complement textual
          information to obtain the overall information
           S(v) = s(v) + h(v)
            ( )    ( )    ( )
              - S(v): overall information
              - s(v): textual information
              - h(v): hyper information
                                        r(v, w)
        • h(v) =      Σ             F             S(w)
                      w Є |ch[v]|
              - F a fading constant F Є (0 1)
                F:           constant,      (0,
              - r(v, w): the rank of w after sorting the children
                of v by S(w)
        a remedy of the previous approach (Mark 1988)
6              R. Akerkar
HITS - Kleinberg’s Algorithm
     • HITS – Hypertext Induced Topic Selection
               yp                 p

     • For each vertex v Є V in a subgraph of interest:
                 a(v) - the authority of v
                 h(v) - the hubness of v

     • A site is very a thoritati e if it recei es man
                  er authoritative        receives many
       citations. Citation from important sites weight
       more than citations from less-important sites

     • Hubness shows the importance of a site. A good
       hub is a site that links to many authoritative sites
                                      y

7             R. Akerkar
Authority and Hubness

    2                                                       5


    3                            1     1                    6


4
                                                            7

        a(1) = h(2) + h(3) + h(4)    h(1) = a(5) + a(6) + a(7)

8                   R. Akerkar
Authority d H b
    A th it and Hubness Convergence
                        C     g

       • R
         Recursive d
               i dependency:
                      d

              a(v)  Σ               h(w)
                         w Є pa[v]

              h(v)  Σ w Є ch[v] a(w)

        • Using Linear Algebra, we can prove:

               a(v) and h(v) converge



9          R. Akerkar
HITS Example
  Find a base subgraph:
• Start with a root set R {1, 2, 3, 4}

• {1, 2, 3, 4} - nodes relevant to
               the topic

• Expand the root set R to include
all the children and a fixed
number of parents of nodes in R

 A new set S (base subgraph) 


10                  R. Akerkar
HITS Example
     BaseSubgraph( R d)
                   R,
     1. S  r
     2.   for each v in R
     3.   do S  S U ch[v]
     4.       P  pa[v]
     5.       if |P| > d
               f
     6.       then P  arbitrary subset of P having size d
     7.            SSUP
     8.
     8    return S




11                R. Akerkar
HITS Example
      Hubs and authorities: two n-dimensional a and h
              HubsAuthorities(G)
                                              |V|
               1     1  [1,…,1] Є R
               2     a0  h 0  1
               3     t 1
               4     repeat
               5            for each v in V
               6            do at (v)  Σ                      h      (w)
                                                    w Є pa[v] t -1
               7                    ht (v)  Σ w Є pa[v] a              (w)
               8              a t  at / || at ||                  t -1
               9              h t  ht / || ht ||
               10             t  t+1
               11     until || a t – at -1 || + || h t – ht -1 || < ε
                                          1                  1
               12     return (a t , h t )
12            R. Akerkar
HITS Example Results
             y
     Authority
     Hubness




1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Authority and hubness weights
13               R. Akerkar
HITS Improvements
     Brarat and Henzinger (1998)
       • HITS problems
          1) The document can contain many identical links to
             the same document in another host
          2) Links are generated automatically (e.g. messages
             posted on newsgroups)

       • Solutions
          1) Assign weight to identical multiple edges, which
             are inversely proportional to their multiplicity
          2) Prune irrelevant nodes or regulating the influence
             of a node with a relevance weight

14             R. Akerkar
PageRank
 Introduced by Page et al (
                y g            (1998))
   The weight is assigned by the rank of parents




 Difference with HITS
   HITS takes Hubness & Authority weights
   The page rank is proportional to its parents’ rank, but inversely
    proportional to its parents’ outdegree


                                                                15
Matrix Notation




                                     Adjacent M i
                                     Adj      Matrix



                                     A=


  * http://www.kusatro.kyoto-u.com

                                                       16
Matrix Notation
 Matrix Notation
  r=αBr=Mr
    α : eigenvalue
    r : eigenvector of B


       Ax=λx
                              B=
       | A - λI | x = 0

 Finding Pagerank
  to find eigenvector of B with an associated eigenvalue α

                                                  17
Matrix Notation
     PageRank: eigenvector of P relative to max eigenvalue

       B = P D P-1
           D: diagonal matrix of eigenvalues {λ1, … λn}
           P: regular matrix that consists of eigenvectors




        PageRank   r1 =
                                      normalized


18                 R. Akerkar
Matrix Notation




                        • Confirm the result
                         # of inlinks from high ranked page
                         hard to explain about 5&2 6&7
                                               5&2,

                        • Interesting Topic
                          How do you create your
                                homepage highly ranked?
19         R. Akerkar
Markov Chain Notation
      Random surfer model
        Description of a random walk through the Web graph
        Interpreted as a transition matrix with asymptotic probability that a
         surfer is currently browsing that page




                 rt = M rtt-1
                            1
                 M: transition matrix for a first-order Markov chain (stochastic)

        Does it converge to some sensible solution ( too)
                       g                           (as   )
        regardless of the initial ranks ?
20                    R. Akerkar
Problem
      “Rank Sink” Problem
        In general, many Web pages have no inlinks/outlinks
        It results in dangling edges in the graph

       E.g.
         no parent  rank 0
             MT converges to a matrix
              whose last column is all zero

         no children  no solution
             MT converges to zero matrix




21                      R. Akerkar
Modification
      Surfer will restart browsing by p
                                  g y picking a new Web p g at
                                            g           page
       random
         M=(B+E)
           E : escape matrix
           M : stochastic matrix

      S ll problem?
       Still bl
        It is not guaranteed that M is primitive

        If M is stochastic and primitive, PageRank converges to
         corresponding stationary distribution of M

22                      R. Akerkar
PageRank Algorithm




                        * Page et al 1998
                                  al,
23         R. Akerkar
Distribution of the Mixture Model
      The probability distribution that results from combining the
           p         y                                        g
       Markovian random walk distribution & the static rank source
       distribution
             r = εe + (1- ε)x
                       1
                     ε: probability of selecting non-linked page

             PageRank


 Now, transition matrix [εH + (1- ε)M] is primitive and stochastic
 rt converges to the dominant eigenvector
24                    R. Akerkar
Stability
      Whether the link analysis algorithms based on eigenvectors
       are stable in the sense that results don’t change significantly?

      Th connectivity of a portion of the graph is changed
       The              f            f h       h h        d
       arbitrary
        How will it affect the results of algorithms?




25                 R. Akerkar
Stability of HITS
     Ng et al (2001)
     • A bound on the number of hyperlinks k that can added or deleted
     from one page without affecting the authority or hubness weights
     • It is possible to perturb a symmetric matrix by a quantity that grows
     as δ that produces a constant perturbation of the dominant
     eigenvector




                                   δ: eigengap λ1 – λ2
                                        g g p
                                   d: maximum outdegree of G
26                    R. Akerkar
Stability of PageRank

                                         Ng et al (2001)

               V: the set of vertices touched by the perturbation
 The parameter ε of the mixture model has a stabilization role
 If the set of pages affected by the p
                pg              y     perturbation have a small rank, the overall
                                                                    ,
     change will also be small


                                         tighter bound by
                                         Bianchini et al (2001)

              δ(j) >= 2 depends on the edges incident on j
27                    R. Akerkar
SALSA
      SALSA (Lempel, Moran 2001)
        Probabilistic extension of the HITS algorithm
        Random walk is carried out by following hyperlinks both in the
         forward and in the backward direction
      Two separate random walks
        Hub walk
        Authority walk




28               R. Akerkar
Forming a Bipartite Graph in SALSA




29         R. Akerkar
Random Walks
      Hub walk
        Follow a Web link from a page uh to a page wa (a forward link)
         and then
        Immediately traverse a backlink going from wa to vh, where (u w)
                                                                     (u,w)
         Є E and (v,w) Є E
      Authority Walk
                 y
        Follow a Web link from a page w(a) to a page u(h) (a backward
         link) and then
        Immediately traverse a f
                d l             forward link going b k f
                                      dl k         back from vh to wa
         where (u,w) Є E and (v,w) Є E


30                R. Akerkar
Computing Weights
      Hub weight computed from the sum of the product of the
       inverse degree of the in-links and the out-links




31               R. Akerkar
Why We Care
      Lempel and Moran (2001) showed theoretically that SALSA
       weights are more robust that HITS weights in the presence of the
       Tightly Knit Community (TKC) Effect.
        This effect occurs when a small collection of pages (related to a given topic)
         is connected so that every hub links to every authority and includes as a special
         case the mutual reinforcement effect
      The pages in a community connected in this way can be ranked
       highly by HITS, higher than pages in a much larger collection
                 HITS
       where only some hubs link to some authorities
      TKC could be exploited by spammers hoping to increase their
       page weight ( li k f
               i h (e.g. link farms)
                                   )



32                 R. Akerkar
A Similar Approach
      Rafiei and Mendelzon (2000) and Ng et al. (2001)
       propose similar approaches using reset as in
       PageRank
      Unlike PageRank, in this model the surfer will
       follow a forward link on odd steps but a
       backward li k
       b k d link on even steps t
      The stability properties of these ranking
       distributions are similar to those of PageRank (Ng
       et al. 2001)

33             R. Akerkar
Overcoming TKC
      Similarity downweight sequencing and sequential
                y        g            g
      clustering (Roberts and Rosenthal 2003)
       Consider the underlying structure of clusters
                             y g
       Suggest downweight sequencing to avoid the
        Tight Knit Community problem
          g                     yp
       Results indicate approach is effective for few
        tested queries, but still untested on a large scale


34            R. Akerkar
PHITS and More
      PHITS: Cohn and Chang (2000)
        Only the principal eigenvector is extracted using SALSA, so the
         authority along the remaining eigenvectors is completely
         neglected
           Account for more eigenvectors of the co-citation matrix

      See also Lempel, Moran (2003)




35                 R. Akerkar
Limits of Link Analysis
      META tags/ invisible text
        Search engines relying on meta tags in documents are often
         misled (intentionally) by web developers
     P f
      Pay-for-place
                 l
       Search engine bias : organizations pay search engines and page
        rank
       Advertisements: organizations pay high ranking pages for
        advertising space
           W h a primary effect of increased visibility to end users and a secondary
            With           ff     f         d bl              d         d        d
            effect of increased respectability due to relevance to high ranking page



36                 R. Akerkar
Limits of Link Analysis
      Stability
        Adding even a small number of nodes/edges to the graph has a
         significant impact
      T i drift – similar t TKC
       Topic d ift i il to
        A top authority may be a hub of pages on a different topic
         resulting in increased rank of the authority p g
                 g                                  y page
      Content evolution
        Adding/removing links/content can affect the intuitive
         authority rank of a page requiring recalculation of page ranks



37                R. Akerkar
Further Reading
        R. Akerkar d P. Lingras, Building an I ll
         R Ak k and P L             B ld      Intelligent W b Th
                                                          Web: Theory
         and Practice, Jones & Bartlett, 2008
        http://www.jbpub.com/catalog/9780763741372/
            p          j p                g




38               R. Akerkar

More Related Content

What's hot

OSINT e Ingeniería Social aplicada a las investigaciones
OSINT e Ingeniería Social aplicada a las investigacionesOSINT e Ingeniería Social aplicada a las investigaciones
OSINT e Ingeniería Social aplicada a las investigacionesemilianox
 
Logistic Regression.pptx
Logistic Regression.pptxLogistic Regression.pptx
Logistic Regression.pptxMuskaan194530
 
CMSC 56 | Lecture 15: Closures of Relations
CMSC 56 | Lecture 15: Closures of RelationsCMSC 56 | Lecture 15: Closures of Relations
CMSC 56 | Lecture 15: Closures of Relationsallyn joy calcaben
 
Data Encryption and Decryption using Hill Cipher
Data Encryption and Decryption using Hill CipherData Encryption and Decryption using Hill Cipher
Data Encryption and Decryption using Hill CipherAashirwad Kashyap
 

What's hot (7)

OSINT e Ingeniería Social aplicada a las investigaciones
OSINT e Ingeniería Social aplicada a las investigacionesOSINT e Ingeniería Social aplicada a las investigaciones
OSINT e Ingeniería Social aplicada a las investigaciones
 
Logistic Regression.pptx
Logistic Regression.pptxLogistic Regression.pptx
Logistic Regression.pptx
 
Osint
OsintOsint
Osint
 
CMSC 56 | Lecture 15: Closures of Relations
CMSC 56 | Lecture 15: Closures of RelationsCMSC 56 | Lecture 15: Closures of Relations
CMSC 56 | Lecture 15: Closures of Relations
 
SECURITY SERVICES
SECURITY SERVICESSECURITY SERVICES
SECURITY SERVICES
 
Data Encryption and Decryption using Hill Cipher
Data Encryption and Decryption using Hill CipherData Encryption and Decryption using Hill Cipher
Data Encryption and Decryption using Hill Cipher
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 

More from R A Akerkar

Rajendraakerkar lemoproject
Rajendraakerkar lemoprojectRajendraakerkar lemoproject
Rajendraakerkar lemoprojectR A Akerkar
 
Big Data and Harvesting Data from Social Media
Big Data and Harvesting Data from Social MediaBig Data and Harvesting Data from Social Media
Big Data and Harvesting Data from Social MediaR A Akerkar
 
Can You Really Make Best Use of Big Data?
Can You Really Make Best Use of Big Data?Can You Really Make Best Use of Big Data?
Can You Really Make Best Use of Big Data?R A Akerkar
 
Big data in Business Innovation
Big data in Business Innovation   Big data in Business Innovation
Big data in Business Innovation R A Akerkar
 
What is Big Data ?
What is Big Data ?What is Big Data ?
What is Big Data ?R A Akerkar
 
Connecting and Exploiting Big Data
Connecting and Exploiting Big DataConnecting and Exploiting Big Data
Connecting and Exploiting Big DataR A Akerkar
 
Linked open data
Linked open dataLinked open data
Linked open dataR A Akerkar
 
Semi structure data extraction
Semi structure data extractionSemi structure data extraction
Semi structure data extractionR A Akerkar
 
Big data: analyzing large data sets
Big data: analyzing large data setsBig data: analyzing large data sets
Big data: analyzing large data setsR A Akerkar
 
Description logics
Description logicsDescription logics
Description logicsR A Akerkar
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligenceR A Akerkar
 
Case Based Reasoning
Case Based ReasoningCase Based Reasoning
Case Based ReasoningR A Akerkar
 
Semantic Markup
Semantic Markup Semantic Markup
Semantic Markup R A Akerkar
 
Intelligent natural language system
Intelligent natural language systemIntelligent natural language system
Intelligent natural language systemR A Akerkar
 
Knowledge Organization Systems
Knowledge Organization SystemsKnowledge Organization Systems
Knowledge Organization SystemsR A Akerkar
 
Rational Unified Process for User Interface Design
Rational Unified Process for User Interface DesignRational Unified Process for User Interface Design
Rational Unified Process for User Interface DesignR A Akerkar
 
Unified Modelling Language
Unified Modelling LanguageUnified Modelling Language
Unified Modelling LanguageR A Akerkar
 
Statistical Preliminaries
Statistical PreliminariesStatistical Preliminaries
Statistical PreliminariesR A Akerkar
 

More from R A Akerkar (20)

Rajendraakerkar lemoproject
Rajendraakerkar lemoprojectRajendraakerkar lemoproject
Rajendraakerkar lemoproject
 
Big Data and Harvesting Data from Social Media
Big Data and Harvesting Data from Social MediaBig Data and Harvesting Data from Social Media
Big Data and Harvesting Data from Social Media
 
Can You Really Make Best Use of Big Data?
Can You Really Make Best Use of Big Data?Can You Really Make Best Use of Big Data?
Can You Really Make Best Use of Big Data?
 
Big data in Business Innovation
Big data in Business Innovation   Big data in Business Innovation
Big data in Business Innovation
 
What is Big Data ?
What is Big Data ?What is Big Data ?
What is Big Data ?
 
Connecting and Exploiting Big Data
Connecting and Exploiting Big DataConnecting and Exploiting Big Data
Connecting and Exploiting Big Data
 
Linked open data
Linked open dataLinked open data
Linked open data
 
Semi structure data extraction
Semi structure data extractionSemi structure data extraction
Semi structure data extraction
 
Big data: analyzing large data sets
Big data: analyzing large data setsBig data: analyzing large data sets
Big data: analyzing large data sets
 
Description logics
Description logicsDescription logics
Description logics
 
Data Mining
Data MiningData Mining
Data Mining
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligence
 
Case Based Reasoning
Case Based ReasoningCase Based Reasoning
Case Based Reasoning
 
Semantic Markup
Semantic Markup Semantic Markup
Semantic Markup
 
Intelligent natural language system
Intelligent natural language systemIntelligent natural language system
Intelligent natural language system
 
Data mining
Data miningData mining
Data mining
 
Knowledge Organization Systems
Knowledge Organization SystemsKnowledge Organization Systems
Knowledge Organization Systems
 
Rational Unified Process for User Interface Design
Rational Unified Process for User Interface DesignRational Unified Process for User Interface Design
Rational Unified Process for User Interface Design
 
Unified Modelling Language
Unified Modelling LanguageUnified Modelling Language
Unified Modelling Language
 
Statistical Preliminaries
Statistical PreliminariesStatistical Preliminaries
Statistical Preliminaries
 

Recently uploaded

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Link analysis

  • 1. Link Analysis Rajendra Akerkar Vestlandsforsking, Norway
  • 2. Objectives  To review common approaches to l k analysis h link l  To calculate the popularity of a site based on link analysis  To model human judgments indirectly 2 R. Akerkar
  • 3. Outline 1. 1 Early Approaches to Link Analysis 2. Hubs and Authorities: HITS 3. Page Rank 4. Stability 5. Probabilistic Link Analysis 6. Limitation of Link Analysis 3 R. Akerkar
  • 4. Early Approaches Basic Assumptions • Hyperlinks contain information about the human judgment of a site • The more incoming links to a site the more it is site, judged important Bray 1996 y • The visibility of a site is measured by the number of other sites pointing to it • The luminosity of a site is measured by the number of other sites to which it points  Limitation: failure to capture the relative importance of different parents (children) sites 4 R. Akerkar
  • 5. Early Approaches Mark 1988 • To calculate the score S of a document at vertex v 1 S(v) = s(v) + Σ S(w) | ch[v] | w Є |ch(v)| v: a vertex i the h in h hypertext graph G = (V E) h (V, S(v): the global score s(v): the score if the document is isolated ch(v): children of the document at vertex v • Limitation: - Require G to be a directed acyclic g p ( q y graph (DAG) ) - If v has a single link to w, S(v) > S(w) - If v has a long path to w and s(v) < s(w), then S(v) > S (w)  unreasonable 5 R. Akerkar
  • 6. Early Approaches Marchiori (1997) • Hyper information should complement textual information to obtain the overall information S(v) = s(v) + h(v) ( ) ( ) ( ) - S(v): overall information - s(v): textual information - h(v): hyper information r(v, w) • h(v) = Σ F S(w) w Є |ch[v]| - F a fading constant F Є (0 1) F: constant, (0, - r(v, w): the rank of w after sorting the children of v by S(w)  a remedy of the previous approach (Mark 1988) 6 R. Akerkar
  • 7. HITS - Kleinberg’s Algorithm • HITS – Hypertext Induced Topic Selection yp p • For each vertex v Є V in a subgraph of interest: a(v) - the authority of v h(v) - the hubness of v • A site is very a thoritati e if it recei es man er authoritative receives many citations. Citation from important sites weight more than citations from less-important sites • Hubness shows the importance of a site. A good hub is a site that links to many authoritative sites y 7 R. Akerkar
  • 8. Authority and Hubness 2 5 3 1 1 6 4 7 a(1) = h(2) + h(3) + h(4) h(1) = a(5) + a(6) + a(7) 8 R. Akerkar
  • 9. Authority d H b A th it and Hubness Convergence C g • R Recursive d i dependency: d a(v)  Σ h(w) w Є pa[v] h(v)  Σ w Є ch[v] a(w) • Using Linear Algebra, we can prove: a(v) and h(v) converge 9 R. Akerkar
  • 10. HITS Example Find a base subgraph: • Start with a root set R {1, 2, 3, 4} • {1, 2, 3, 4} - nodes relevant to the topic • Expand the root set R to include all the children and a fixed number of parents of nodes in R  A new set S (base subgraph)  10 R. Akerkar
  • 11. HITS Example BaseSubgraph( R d) R, 1. S  r 2. for each v in R 3. do S  S U ch[v] 4. P  pa[v] 5. if |P| > d f 6. then P  arbitrary subset of P having size d 7. SSUP 8. 8 return S 11 R. Akerkar
  • 12. HITS Example Hubs and authorities: two n-dimensional a and h HubsAuthorities(G) |V| 1 1  [1,…,1] Є R 2 a0  h 0  1 3 t 1 4 repeat 5 for each v in V 6 do at (v)  Σ h (w) w Є pa[v] t -1 7 ht (v)  Σ w Є pa[v] a (w) 8 a t  at / || at || t -1 9 h t  ht / || ht || 10 t  t+1 11 until || a t – at -1 || + || h t – ht -1 || < ε 1 1 12 return (a t , h t ) 12 R. Akerkar
  • 13. HITS Example Results y Authority Hubness 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Authority and hubness weights 13 R. Akerkar
  • 14. HITS Improvements Brarat and Henzinger (1998) • HITS problems 1) The document can contain many identical links to the same document in another host 2) Links are generated automatically (e.g. messages posted on newsgroups) • Solutions 1) Assign weight to identical multiple edges, which are inversely proportional to their multiplicity 2) Prune irrelevant nodes or regulating the influence of a node with a relevance weight 14 R. Akerkar
  • 15. PageRank  Introduced by Page et al ( y g (1998))  The weight is assigned by the rank of parents  Difference with HITS  HITS takes Hubness & Authority weights  The page rank is proportional to its parents’ rank, but inversely proportional to its parents’ outdegree 15
  • 16. Matrix Notation Adjacent M i Adj Matrix A= * http://www.kusatro.kyoto-u.com 16
  • 17. Matrix Notation  Matrix Notation r=αBr=Mr α : eigenvalue r : eigenvector of B Ax=λx B= | A - λI | x = 0 Finding Pagerank  to find eigenvector of B with an associated eigenvalue α 17
  • 18. Matrix Notation PageRank: eigenvector of P relative to max eigenvalue B = P D P-1 D: diagonal matrix of eigenvalues {λ1, … λn} P: regular matrix that consists of eigenvectors PageRank r1 = normalized 18 R. Akerkar
  • 19. Matrix Notation • Confirm the result # of inlinks from high ranked page hard to explain about 5&2 6&7 5&2, • Interesting Topic How do you create your homepage highly ranked? 19 R. Akerkar
  • 20. Markov Chain Notation  Random surfer model  Description of a random walk through the Web graph  Interpreted as a transition matrix with asymptotic probability that a surfer is currently browsing that page rt = M rtt-1 1 M: transition matrix for a first-order Markov chain (stochastic) Does it converge to some sensible solution ( too) g (as ) regardless of the initial ranks ? 20 R. Akerkar
  • 21. Problem  “Rank Sink” Problem  In general, many Web pages have no inlinks/outlinks  It results in dangling edges in the graph E.g. no parent  rank 0 MT converges to a matrix whose last column is all zero no children  no solution MT converges to zero matrix 21 R. Akerkar
  • 22. Modification  Surfer will restart browsing by p g y picking a new Web p g at g page random M=(B+E) E : escape matrix M : stochastic matrix  S ll problem? Still bl  It is not guaranteed that M is primitive  If M is stochastic and primitive, PageRank converges to corresponding stationary distribution of M 22 R. Akerkar
  • 23. PageRank Algorithm * Page et al 1998 al, 23 R. Akerkar
  • 24. Distribution of the Mixture Model  The probability distribution that results from combining the p y g Markovian random walk distribution & the static rank source distribution r = εe + (1- ε)x 1 ε: probability of selecting non-linked page PageRank Now, transition matrix [εH + (1- ε)M] is primitive and stochastic rt converges to the dominant eigenvector 24 R. Akerkar
  • 25. Stability  Whether the link analysis algorithms based on eigenvectors are stable in the sense that results don’t change significantly?  Th connectivity of a portion of the graph is changed The f f h h h d arbitrary  How will it affect the results of algorithms? 25 R. Akerkar
  • 26. Stability of HITS Ng et al (2001) • A bound on the number of hyperlinks k that can added or deleted from one page without affecting the authority or hubness weights • It is possible to perturb a symmetric matrix by a quantity that grows as δ that produces a constant perturbation of the dominant eigenvector δ: eigengap λ1 – λ2 g g p d: maximum outdegree of G 26 R. Akerkar
  • 27. Stability of PageRank Ng et al (2001) V: the set of vertices touched by the perturbation  The parameter ε of the mixture model has a stabilization role  If the set of pages affected by the p pg y perturbation have a small rank, the overall , change will also be small tighter bound by Bianchini et al (2001) δ(j) >= 2 depends on the edges incident on j 27 R. Akerkar
  • 28. SALSA  SALSA (Lempel, Moran 2001)  Probabilistic extension of the HITS algorithm  Random walk is carried out by following hyperlinks both in the forward and in the backward direction  Two separate random walks  Hub walk  Authority walk 28 R. Akerkar
  • 29. Forming a Bipartite Graph in SALSA 29 R. Akerkar
  • 30. Random Walks  Hub walk  Follow a Web link from a page uh to a page wa (a forward link) and then  Immediately traverse a backlink going from wa to vh, where (u w) (u,w) Є E and (v,w) Є E  Authority Walk y  Follow a Web link from a page w(a) to a page u(h) (a backward link) and then  Immediately traverse a f d l forward link going b k f dl k back from vh to wa where (u,w) Є E and (v,w) Є E 30 R. Akerkar
  • 31. Computing Weights  Hub weight computed from the sum of the product of the inverse degree of the in-links and the out-links 31 R. Akerkar
  • 32. Why We Care  Lempel and Moran (2001) showed theoretically that SALSA weights are more robust that HITS weights in the presence of the Tightly Knit Community (TKC) Effect.  This effect occurs when a small collection of pages (related to a given topic) is connected so that every hub links to every authority and includes as a special case the mutual reinforcement effect  The pages in a community connected in this way can be ranked highly by HITS, higher than pages in a much larger collection HITS where only some hubs link to some authorities  TKC could be exploited by spammers hoping to increase their page weight ( li k f i h (e.g. link farms) ) 32 R. Akerkar
  • 33. A Similar Approach  Rafiei and Mendelzon (2000) and Ng et al. (2001) propose similar approaches using reset as in PageRank  Unlike PageRank, in this model the surfer will follow a forward link on odd steps but a backward li k b k d link on even steps t  The stability properties of these ranking distributions are similar to those of PageRank (Ng et al. 2001) 33 R. Akerkar
  • 34. Overcoming TKC  Similarity downweight sequencing and sequential y g g clustering (Roberts and Rosenthal 2003)  Consider the underlying structure of clusters y g  Suggest downweight sequencing to avoid the Tight Knit Community problem g yp  Results indicate approach is effective for few tested queries, but still untested on a large scale 34 R. Akerkar
  • 35. PHITS and More  PHITS: Cohn and Chang (2000)  Only the principal eigenvector is extracted using SALSA, so the authority along the remaining eigenvectors is completely neglected  Account for more eigenvectors of the co-citation matrix  See also Lempel, Moran (2003) 35 R. Akerkar
  • 36. Limits of Link Analysis  META tags/ invisible text  Search engines relying on meta tags in documents are often misled (intentionally) by web developers P f Pay-for-place l  Search engine bias : organizations pay search engines and page rank  Advertisements: organizations pay high ranking pages for advertising space  W h a primary effect of increased visibility to end users and a secondary With ff f d bl d d d effect of increased respectability due to relevance to high ranking page 36 R. Akerkar
  • 37. Limits of Link Analysis  Stability  Adding even a small number of nodes/edges to the graph has a significant impact  T i drift – similar t TKC Topic d ift i il to  A top authority may be a hub of pages on a different topic resulting in increased rank of the authority p g g y page  Content evolution  Adding/removing links/content can affect the intuitive authority rank of a page requiring recalculation of page ranks 37 R. Akerkar
  • 38. Further Reading  R. Akerkar d P. Lingras, Building an I ll R Ak k and P L B ld Intelligent W b Th Web: Theory and Practice, Jones & Bartlett, 2008  http://www.jbpub.com/catalog/9780763741372/ p j p g 38 R. Akerkar