SlideShare a Scribd company logo
1 of 26
Download to read offline
Relating Web Characteristics
         Ricardo Baeza-Yates
            Carlos Castillo
         Universidad de Chile
Agenda
    Introduction
•
    Link-based ranking
•
    Web structure
•
    Web characteristics
•
    Web usage
•
    Web dynamics
•
    Conclusions
•

              Relating Web Characteristics
Introduction: Sample
    Web sample: .CL domain on year 2000
•
    670,000 pages in 7,500 domains
•
    15kb average page size
•
    Collection from the TodoCL web search
•
    engine




               Relating Web Characteristics
Introduction: Emphasis

• Broder et al.: Graph Structure on the
  Web (2000)
  – Page-based structure based on strongly
    connected components
  – The Web graph is not a random graph
  – Process: cut & paste model
• Our is mostly a site-based analysis
  – Trying to make Web structure meaningful
              Relating Web Characteristics
Introduction: The Empire




       Relating Web Characteristics
Introduction: One Map




      Relating Web Characteristics
Link ranking: Pagerank
                                  Pages that point
                                  to page p
                                              k
                q
Pagerank ( p ) = + (1 − q )∑ Pagerank (ri )
                N          i =1


                                                  Currently used by
                                                  Google
Probability of a
                                                  Brin & Page, 1998
random jump over
number of pages

                   Relating Web Characteristics
Link ranking: Hubs &
          Authorities
• HITS algorithm (Kleinberg, 1998)
• A good authority is a page pointed by
  good hubs, so we assume that it has
  good content
• A good hub is a page that points to
  good authorities, so we assume it is a
  good set of links
• Linear system calculated by numerical
  iteration
              Relating Web Characteristics
Link ranking: Distribution
                            <2% with relevant
                            Pagerank




9% with relevant
                                                  2-3% with relevant
hub score
                                                  authority score




                   Relating Web Characteristics
Link ranking: Correlation



                                         Hub score,
                                       authority score
                                       and Pagerank
                                        do not seem
                                      to be correlated



       Relating Web Characteristics
Link ranking: Sites

• Which measure to use for sites ?
• Average score
  – But good sites can have lots of bad pages
• Maximum score
  – But one good page cannot be all that is
    needed to be a good site
• Sum of the scores of all pages
  – Natural for Pagerank
               Relating Web Characteristics
Link ranking: Sites Graph

                   90% relevant site-Pagerank




It’s harder to have a
good hub than a
good authority (site)



                    Relating Web Characteristics
Web Structure: Basis
• The Web graph has structure:

                 MAIN


 IN
                                            OUT



  ISLANDS

             Relating Web Characteristics
Web Structure: Basis (cont.)
• The MAIN component has structure:




        MAIN IN
                                        MAIN OUT
                  MAIN MAIN


IN
             MAIN NORM                             OUT

              Relating Web Characteristics
Web Structure: Sketch




      Relating Web Characteristics
Web Structure: Degree




      Relating Web Characteristics
Web Structure: Sizes




     Relating Web Characteristics
Web Structure: Preferences




        Relating Web Characteristics
Web Structure: Preferences

                  OUT
                                          MAIN
                                          OUT
    OUT



                 MAIN                     MAIN
                 MAIN                     MAIN



    Real           ODP                TodoCL
           Relating Web Characteristics
Web Structure: Various




      Relating Web Characteristics
Web Structure: Link Scores




        Relating Web Characteristics
Web Dynamics: Ages
• The kernel of the Web comes from the
  past




             Relating Web Characteristics
Web Dynamics: By
  Component




    Relating Web Characteristics
Web Dynamics: Pagerank


            Pagerank is biased
            against newer pages




       Relating Web Characteristics
Web Dynamics: Hubs &
                       Authorities
Authority Score




                                        Hub Score


                              Age (months)

                        Relating Web Characteristics
Conclusions
• Pagerank/HITS do not seem to be
  correlated
  – And Pagerank is biased to older pages
• Site ranking can help to make good
  human-selected directories
• Finding good pages is not so simple
• Characterizing Web structure gives
  valuable insight
  – Web Graph Mining is just starting
               Relating Web Characteristics

More Related Content

Viewers also liked

Bioinformatics Meets Information Retrieval: State of the Art and a Case Study
Bioinformatics Meets Information Retrieval: State of the Art and a Case StudyBioinformatics Meets Information Retrieval: State of the Art and a Case Study
Bioinformatics Meets Information Retrieval: State of the Art and a Case StudyEloisa Vargiu
 
PageRank Algorithm In data mining
PageRank Algorithm In data miningPageRank Algorithm In data mining
PageRank Algorithm In data miningMai Mustafa
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slidesmahavir_a
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint201014161
 
Machine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataMachine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataPier Luca Lanzi
 

Viewers also liked (8)

Google PageRank
Google PageRankGoogle PageRank
Google PageRank
 
Bioinformatics Meets Information Retrieval: State of the Art and a Case Study
Bioinformatics Meets Information Retrieval: State of the Art and a Case StudyBioinformatics Meets Information Retrieval: State of the Art and a Case Study
Bioinformatics Meets Information Retrieval: State of the Art and a Case Study
 
PageRank Algorithm In data mining
PageRank Algorithm In data miningPageRank Algorithm In data mining
PageRank Algorithm In data mining
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
 
Web Mining
Web Mining Web Mining
Web Mining
 
Search Engine Demystified
Search Engine DemystifiedSearch Engine Demystified
Search Engine Demystified
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint
 
Machine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataMachine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web Data
 

Similar to Relating Web Characteristics with Link-Based Ranking

A4Uexpo Internal Linking Structure
A4Uexpo Internal Linking StructureA4Uexpo Internal Linking Structure
A4Uexpo Internal Linking StructureRoy Huiskes
 
Seo Best Practices
Seo Best PracticesSeo Best Practices
Seo Best PracticesKent Schnepp
 
Page rank by university of michagain.ppt
Page rank by university of michagain.pptPage rank by university of michagain.ppt
Page rank by university of michagain.pptrayyverma
 
Jonathan Stewart iCrossing UK Natural Search Link Building Basics
Jonathan Stewart iCrossing UK Natural Search Link Building BasicsJonathan Stewart iCrossing UK Natural Search Link Building Basics
Jonathan Stewart iCrossing UK Natural Search Link Building BasicsiCrossing
 
Getting the Most out of Linkscape
Getting the Most out of LinkscapeGetting the Most out of Linkscape
Getting the Most out of LinkscapeNick Gerner
 
Technical SEO (Pagination & Crawling) by Adam Audette
Technical SEO (Pagination & Crawling) by Adam AudetteTechnical SEO (Pagination & Crawling) by Adam Audette
Technical SEO (Pagination & Crawling) by Adam AudetteAdam Audette
 
Your Website. What's Possible and What Should You Strive to Achieve? A Case S...
Your Website. What's Possible and What Should You Strive to Achieve? A Case S...Your Website. What's Possible and What Should You Strive to Achieve? A Case S...
Your Website. What's Possible and What Should You Strive to Achieve? A Case S...Site-Seeker, Inc.
 
Alec Mitchell Relationship Building Defining And Querying Complex Relatio...
Alec Mitchell   Relationship Building   Defining And Querying Complex Relatio...Alec Mitchell   Relationship Building   Defining And Querying Complex Relatio...
Alec Mitchell Relationship Building Defining And Querying Complex Relatio...Vincenzo Barone
 
Gopetfriendly.com seo Pitch ppt
Gopetfriendly.com seo Pitch pptGopetfriendly.com seo Pitch ppt
Gopetfriendly.com seo Pitch pptSiddheshSawant54
 
Lifting The Lid On Search Marketing
Lifting The Lid On Search MarketingLifting The Lid On Search Marketing
Lifting The Lid On Search Marketingwater&stone
 
SEO Evatt INMA Dallas
SEO Evatt INMA DallasSEO Evatt INMA Dallas
SEO Evatt INMA DallasSteven Evatt
 
Windows Share Point Services V3 Presentation
Windows Share Point Services V3 PresentationWindows Share Point Services V3 Presentation
Windows Share Point Services V3 PresentationADRose
 
Seocertification TRAINING Courses
Seocertification TRAINING CoursesSeocertification TRAINING Courses
Seocertification TRAINING CoursesDr,Saini Anand
 

Similar to Relating Web Characteristics with Link-Based Ranking (20)

A4Uexpo Internal Linking Structure
A4Uexpo Internal Linking StructureA4Uexpo Internal Linking Structure
A4Uexpo Internal Linking Structure
 
Seo Best Practices
Seo Best PracticesSeo Best Practices
Seo Best Practices
 
Page rank by university of michagain.ppt
Page rank by university of michagain.pptPage rank by university of michagain.ppt
Page rank by university of michagain.ppt
 
Site Analysis
Site AnalysisSite Analysis
Site Analysis
 
Jonathan Stewart iCrossing UK Natural Search Link Building Basics
Jonathan Stewart iCrossing UK Natural Search Link Building BasicsJonathan Stewart iCrossing UK Natural Search Link Building Basics
Jonathan Stewart iCrossing UK Natural Search Link Building Basics
 
Stsinks.com seo Pitch ppt
Stsinks.com seo Pitch pptStsinks.com seo Pitch ppt
Stsinks.com seo Pitch ppt
 
Getting the Most out of Linkscape
Getting the Most out of LinkscapeGetting the Most out of Linkscape
Getting the Most out of Linkscape
 
Technical SEO (Pagination & Crawling) by Adam Audette
Technical SEO (Pagination & Crawling) by Adam AudetteTechnical SEO (Pagination & Crawling) by Adam Audette
Technical SEO (Pagination & Crawling) by Adam Audette
 
Imarks linkbuilding
Imarks linkbuildingImarks linkbuilding
Imarks linkbuilding
 
Your Website. What's Possible and What Should You Strive to Achieve? A Case S...
Your Website. What's Possible and What Should You Strive to Achieve? A Case S...Your Website. What's Possible and What Should You Strive to Achieve? A Case S...
Your Website. What's Possible and What Should You Strive to Achieve? A Case S...
 
Google
GoogleGoogle
Google
 
Alec Mitchell Relationship Building Defining And Querying Complex Relatio...
Alec Mitchell   Relationship Building   Defining And Querying Complex Relatio...Alec Mitchell   Relationship Building   Defining And Querying Complex Relatio...
Alec Mitchell Relationship Building Defining And Querying Complex Relatio...
 
Gopetfriendly.com seo Pitch ppt
Gopetfriendly.com seo Pitch pptGopetfriendly.com seo Pitch ppt
Gopetfriendly.com seo Pitch ppt
 
Lifting The Lid On Search Marketing
Lifting The Lid On Search MarketingLifting The Lid On Search Marketing
Lifting The Lid On Search Marketing
 
SEO Evatt INMA Dallas
SEO Evatt INMA DallasSEO Evatt INMA Dallas
SEO Evatt INMA Dallas
 
Seo Basic Training
Seo Basic TrainingSeo Basic Training
Seo Basic Training
 
Windows Share Point Services V3 Presentation
Windows Share Point Services V3 PresentationWindows Share Point Services V3 Presentation
Windows Share Point Services V3 Presentation
 
Seocertification TRAINING Courses
Seocertification TRAINING CoursesSeocertification TRAINING Courses
Seocertification TRAINING Courses
 
Pagerank
PagerankPagerank
Pagerank
 
Page ranking factors
Page ranking factorsPage ranking factors
Page ranking factors
 

More from Carlos Castillo (ChaTo)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social MediaCarlos Castillo (ChaTo)
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Carlos Castillo (ChaTo)
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Carlos Castillo (ChaTo)
 

More from Carlos Castillo (ChaTo) (20)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social Media
 
When no clicks are good news
When no clicks are good newsWhen no clicks are good news
When no clicks are good news
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
 
Discrimination Discovery
Discrimination DiscoveryDiscrimination Discovery
Discrimination Discovery
 
Fairness-Aware Data Mining
Fairness-Aware Data MiningFairness-Aware Data Mining
Fairness-Aware Data Mining
 
Big Crisis Data for ISPC
Big Crisis Data for ISPCBig Crisis Data for ISPC
Big Crisis Data for ISPC
 
Databeers: Big Crisis Data
Databeers: Big Crisis DataDatabeers: Big Crisis Data
Databeers: Big Crisis Data
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Natural experiments
Natural experimentsNatural experiments
Natural experiments
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Link prediction
Link predictionLink prediction
Link prediction
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Graph Partitioning and Spectral Methods
Graph Partitioning and Spectral MethodsGraph Partitioning and Spectral Methods
Graph Partitioning and Spectral Methods
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Indexing
IndexingIndexing
Indexing
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 

Recently uploaded

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 

Recently uploaded (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Relating Web Characteristics with Link-Based Ranking

  • 1. Relating Web Characteristics Ricardo Baeza-Yates Carlos Castillo Universidad de Chile
  • 2. Agenda Introduction • Link-based ranking • Web structure • Web characteristics • Web usage • Web dynamics • Conclusions • Relating Web Characteristics
  • 3. Introduction: Sample Web sample: .CL domain on year 2000 • 670,000 pages in 7,500 domains • 15kb average page size • Collection from the TodoCL web search • engine Relating Web Characteristics
  • 4. Introduction: Emphasis • Broder et al.: Graph Structure on the Web (2000) – Page-based structure based on strongly connected components – The Web graph is not a random graph – Process: cut & paste model • Our is mostly a site-based analysis – Trying to make Web structure meaningful Relating Web Characteristics
  • 5. Introduction: The Empire Relating Web Characteristics
  • 6. Introduction: One Map Relating Web Characteristics
  • 7. Link ranking: Pagerank Pages that point to page p k q Pagerank ( p ) = + (1 − q )∑ Pagerank (ri ) N i =1 Currently used by Google Probability of a Brin & Page, 1998 random jump over number of pages Relating Web Characteristics
  • 8. Link ranking: Hubs & Authorities • HITS algorithm (Kleinberg, 1998) • A good authority is a page pointed by good hubs, so we assume that it has good content • A good hub is a page that points to good authorities, so we assume it is a good set of links • Linear system calculated by numerical iteration Relating Web Characteristics
  • 9. Link ranking: Distribution <2% with relevant Pagerank 9% with relevant 2-3% with relevant hub score authority score Relating Web Characteristics
  • 10. Link ranking: Correlation Hub score, authority score and Pagerank do not seem to be correlated Relating Web Characteristics
  • 11. Link ranking: Sites • Which measure to use for sites ? • Average score – But good sites can have lots of bad pages • Maximum score – But one good page cannot be all that is needed to be a good site • Sum of the scores of all pages – Natural for Pagerank Relating Web Characteristics
  • 12. Link ranking: Sites Graph 90% relevant site-Pagerank It’s harder to have a good hub than a good authority (site) Relating Web Characteristics
  • 13. Web Structure: Basis • The Web graph has structure: MAIN IN OUT ISLANDS Relating Web Characteristics
  • 14. Web Structure: Basis (cont.) • The MAIN component has structure: MAIN IN MAIN OUT MAIN MAIN IN MAIN NORM OUT Relating Web Characteristics
  • 15. Web Structure: Sketch Relating Web Characteristics
  • 16. Web Structure: Degree Relating Web Characteristics
  • 17. Web Structure: Sizes Relating Web Characteristics
  • 18. Web Structure: Preferences Relating Web Characteristics
  • 19. Web Structure: Preferences OUT MAIN OUT OUT MAIN MAIN MAIN MAIN Real ODP TodoCL Relating Web Characteristics
  • 20. Web Structure: Various Relating Web Characteristics
  • 21. Web Structure: Link Scores Relating Web Characteristics
  • 22. Web Dynamics: Ages • The kernel of the Web comes from the past Relating Web Characteristics
  • 23. Web Dynamics: By Component Relating Web Characteristics
  • 24. Web Dynamics: Pagerank Pagerank is biased against newer pages Relating Web Characteristics
  • 25. Web Dynamics: Hubs & Authorities Authority Score Hub Score Age (months) Relating Web Characteristics
  • 26. Conclusions • Pagerank/HITS do not seem to be correlated – And Pagerank is biased to older pages • Site ranking can help to make good human-selected directories • Finding good pages is not so simple • Characterizing Web structure gives valuable insight – Web Graph Mining is just starting Relating Web Characteristics