SlideShare a Scribd company logo
1 of 25
Download to read offline
Getting the Most Out of Social Annotations for Web
                Page Classification
                       DocEng 2009


       Arkaitz Zubiaga, Raquel Mart´
                                   ınez, V´
                                          ıctor Fresno

                    NLP & IR Group @ UNED


                   September 16th, 2009
Introduction


Index


1   Introduction

2   Dataset

3   Experiments

4   Conclusions

5   Future Work




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   2 / 25
Introduction


What is Web Page Classification?


      We have a set of documents:

                                      D = {d1 , ..., d|D| }

      And a set of predefined categories:

                                      C = {c1 , ..., c|C | }

      Web page classification is known as:

                                        dj , ci ∈ D × C




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   3 / 25
Introduction


What are Social Bookmarking Sites? (I)



        Web sites that allow us to save web links, defining metadata to them.
              Delicious1




   1
       http://delicious.com
Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   4 / 25
Introduction


What are Social Bookmarking Sites? (II)




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   5 / 25
Introduction


Social Annotations



      Tags: Keywords. E.g., photography, web2.0, images.
      Notes: Free texts describing web pages. E.g., Flickr is a website for
      photo sharing and photo online management.
      Highlights: Selecting relevant parts of a page.
      Reviews: Free texts with subjective descriptions. E.g., Interesting
      web page with photos.
      Ratings: Gradings. E.g., 1 to 5.




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   6 / 25
Introduction


Motivation




      Classical web page classification methods rely on web pages’ content.
      Motivation: Could social annotations help improving the results?




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   7 / 25
Introduction


Related Work




      Some works (Bao et al., 2007; Heymann et al., 2008) show the
      usefulness of tags for information retrieval.
      (Ramage et al., 2009) show that tags can improved clustering tasks.
      (Noll and Meinell, 2008) make a study on tags, concluding that they
      could be interesting for web page classification tasks.




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   8 / 25
Dataset


Index


1   Introduction

2   Dataset

3   Experiments

4   Conclusions

5   Future Work




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   9 / 25
Dataset


Dataset
       December 2008 - January 2009: monitoring URLs with more than
       100 users annotating it on Delicious’ recent feed.
              87,096 URLs.
       Their classification on the Open Directory Project2 (ODP).
              12,616 URLs matching.
              17 first-level categories.
              Unbalanced.
       Annotations retrieval:
              Number of users annotating it3 .
              Top 10 list of tags3 .
              Full Tag Activity (FTA)3 .
              Notes3 .
              Reviews4 .
              Highlights5 .
   2
     http://www.dmoz.org
   3
     Delicious
   4
     StumbleUpon - http://www.stumbleupon.com
   5
     Diigo - http://diigo.com
Zubiaga, Mart´
             ınez, Fresno (UNED)    Social Annotations for WPC   September 16th, 2009   10 / 25
Experiments


Index


1   Introduction

2   Dataset

3   Experiments

4   Conclusions

5   Future Work




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   11 / 25
Experiments


Configuration




        Support Vector Machines (SVM).
              SVMmulticlass6
        Evaluation: Accuracy.
        Several training sets.
        6 executions for each set.




   6
       http://svmlight.joachims.org
Zubiaga, Mart´
             ınez, Fresno (UNED)      Social Annotations for WPC   September 16th, 2009   12 / 25
Experiments


Classifying with Tags (I)




      Unweighted tags.
      Ranked tags.
      Tag fractions.
      Weighted tags (Top 10).
      Weighted tags (FTA).




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   13 / 25
Experiments


Classyfing with tags (II)




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   14 / 25
Experiments


Classifying with Comments (I)




      Only notes.
      Both notes and reviews.




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   15 / 25
Experiments


Classifying with Comments (II)




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   16 / 25
Experiments


Comparison with the Baseline (Content) (I)




      Content.
      Comments.
      Tags.




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   17 / 25
Experiments


Comparison with the Baseline (Content) (II)




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   18 / 25
Experiments


Combining Classifiers (I)




      Tags + content.
      Tags + comments.
      Comment + content.
      Tags + comments + content.




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   19 / 25
Experiments


Combining Classifiers (II)




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   20 / 25
Conclusions


Index


1   Introduction

2   Dataset

3   Experiments

4   Conclusions

5   Future Work




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   21 / 25
Conclusions


Conclusions



      We analyzed and evaluated the use of social annotations for web page
      classification.
      Some of the annotations are not popular enough.
              Tags and comments are popular.
      Both tags and comments outperform the results by the content.
      Combining the 3 data inputs performs even better.
      We corroborate the conclusions by (Noll and Meinell, 2008), showing
      in a quantitative way that social annotations are useful for web page
      classification.




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   22 / 25
Future Work


Index


1   Introduction

2   Dataset

3   Experiments

4   Conclusions

5   Future Work




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   23 / 25
Future Work


Future Work




      Classifying in a lower level.
      Filtering tags and comments (misbehavior detection).




Zubiaga, Mart´
             ınez, Fresno (UNED)   Social Annotations for WPC   September 16th, 2009   24 / 25
Future Work


Thank You



Achiu    Arigato                   Danke Dhannvaad Dua Netjer en ek Efcharisto
      Gracias Gr`cies
                a    Gratia Grazie Guishepeli
   Hvala Kiitos K¨sz¨n¨m Merc´ Merci Mila
                     o o o          e
   esker Obrigado Shukran          Tack Tak Takk          Shukriya

   T¨nan Tapadh leat Tesekk¨r ederim Thank
    a                       u
                                         you         Toda



Zubiaga, Mart´
             ınez, Fresno (UNED)       Social Annotations for WPC    September 16th, 2009   25 / 25

More Related Content

Similar to Getting the Most Out of Social Annotations for Web Page Classification

Music Recommendation and Discovery in the Long Tail
Music Recommendation and Discovery in the Long TailMusic Recommendation and Discovery in the Long Tail
Music Recommendation and Discovery in the Long TailOscar Celma
 
Helping online communities enrich folksonomies
Helping online communities enrich folksonomiesHelping online communities enrich folksonomies
Helping online communities enrich folksonomiesFreddy Limpens
 
Scoops and Brushes for Software Archaeology - Metadata Dating - slides - Robe...
Scoops and Brushes for Software Archaeology - Metadata Dating - slides - Robe...Scoops and Brushes for Software Archaeology - Metadata Dating - slides - Robe...
Scoops and Brushes for Software Archaeology - Metadata Dating - slides - Robe...Robert Jansma
 
Adversarial ID - Social spam recognition
Adversarial ID - Social spam recognitionAdversarial ID - Social spam recognition
Adversarial ID - Social spam recognitionNicola Miotto
 
Using Technology To Enhance Instruction08
Using Technology To Enhance Instruction08Using Technology To Enhance Instruction08
Using Technology To Enhance Instruction08Lisa Durff
 
Using Technology To Enhance Instruction
Using Technology To Enhance InstructionUsing Technology To Enhance Instruction
Using Technology To Enhance InstructionLisa Durff
 
2_ResearchDataOverview_SarahCallaghan
2_ResearchDataOverview_SarahCallaghan2_ResearchDataOverview_SarahCallaghan
2_ResearchDataOverview_SarahCallaghanOpenAIRE
 
Sarah Callaghan Research Data Overview
Sarah Callaghan Research Data OverviewSarah Callaghan Research Data Overview
Sarah Callaghan Research Data OverviewOpenAIRE
 
Social Networking for the Foreign Language Classroom
Social Networking for the Foreign Language ClassroomSocial Networking for the Foreign Language Classroom
Social Networking for the Foreign Language ClassroomBarbara Lindsey
 
iAnnotate 2013 Introduction
iAnnotate 2013 IntroductioniAnnotate 2013 Introduction
iAnnotate 2013 IntroductionRobert Sanderson
 
Google Kernel Function
Google Kernel FunctionGoogle Kernel Function
Google Kernel FunctionBeibei Yang
 
Approaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social MediaApproaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social MediaJanna Joceli Omena
 
OSNs2.pptx
OSNs2.pptxOSNs2.pptx
OSNs2.pptxAndrii53
 
Share the love: Motivating positive participation in social software
Share the love: Motivating positive participation in social softwareShare the love: Motivating positive participation in social software
Share the love: Motivating positive participation in social softwareMikeBrzozowski
 

Similar to Getting the Most Out of Social Annotations for Web Page Classification (20)

Music Recommendation and Discovery in the Long Tail
Music Recommendation and Discovery in the Long TailMusic Recommendation and Discovery in the Long Tail
Music Recommendation and Discovery in the Long Tail
 
Helping online communities enrich folksonomies
Helping online communities enrich folksonomiesHelping online communities enrich folksonomies
Helping online communities enrich folksonomies
 
Scoops and Brushes for Software Archaeology - Metadata Dating - slides - Robe...
Scoops and Brushes for Software Archaeology - Metadata Dating - slides - Robe...Scoops and Brushes for Software Archaeology - Metadata Dating - slides - Robe...
Scoops and Brushes for Software Archaeology - Metadata Dating - slides - Robe...
 
Link Discovery Tutorial Part V: Hands-On
Link Discovery Tutorial Part V: Hands-OnLink Discovery Tutorial Part V: Hands-On
Link Discovery Tutorial Part V: Hands-On
 
Adversarial ID - Social spam recognition
Adversarial ID - Social spam recognitionAdversarial ID - Social spam recognition
Adversarial ID - Social spam recognition
 
Using Technology To Enhance Instruction08
Using Technology To Enhance Instruction08Using Technology To Enhance Instruction08
Using Technology To Enhance Instruction08
 
Using Technology To Enhance Instruction
Using Technology To Enhance InstructionUsing Technology To Enhance Instruction
Using Technology To Enhance Instruction
 
UTEP
UTEPUTEP
UTEP
 
2_ResearchDataOverview_SarahCallaghan
2_ResearchDataOverview_SarahCallaghan2_ResearchDataOverview_SarahCallaghan
2_ResearchDataOverview_SarahCallaghan
 
Sarah Callaghan Research Data Overview
Sarah Callaghan Research Data OverviewSarah Callaghan Research Data Overview
Sarah Callaghan Research Data Overview
 
Social Networking for the Foreign Language Classroom
Social Networking for the Foreign Language ClassroomSocial Networking for the Foreign Language Classroom
Social Networking for the Foreign Language Classroom
 
iAnnotate 2013 Introduction
iAnnotate 2013 IntroductioniAnnotate 2013 Introduction
iAnnotate 2013 Introduction
 
OSNs.pptx
OSNs.pptxOSNs.pptx
OSNs.pptx
 
OSNs.pptx
OSNs.pptxOSNs.pptx
OSNs.pptx
 
Google Kernel Function
Google Kernel FunctionGoogle Kernel Function
Google Kernel Function
 
Approaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social MediaApproaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social Media
 
Podcasting
PodcastingPodcasting
Podcasting
 
OSNs2.pptx
OSNs2.pptxOSNs2.pptx
OSNs2.pptx
 
Affordances in Social Media for Education
Affordances in Social Media for EducationAffordances in Social Media for Education
Affordances in Social Media for Education
 
Share the love: Motivating positive participation in social software
Share the love: Motivating positive participation in social softwareShare the love: Motivating positive participation in social software
Share the love: Motivating positive participation in social software
 

More from azubiaga

Exploiting context for rumour detection in social media
Exploiting context for rumour detection in social mediaExploiting context for rumour detection in social media
Exploiting context for rumour detection in social mediaazubiaga
 
Crowdsourcing the Annotation of Rumourous Conversations in Social Media
Crowdsourcing the Annotation of Rumourous Conversations in Social MediaCrowdsourcing the Annotation of Rumourous Conversations in Social Media
Crowdsourcing the Annotation of Rumourous Conversations in Social Mediaazubiaga
 
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...azubiaga
 
Clasificación de Páginas Web con Anotaciones Sociales
Clasificación de Páginas Web con Anotaciones SocialesClasificación de Páginas Web con Anotaciones Sociales
Clasificación de Páginas Web con Anotaciones Socialesazubiaga
 
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?azubiaga
 
Master thesis presentation
Master thesis presentationMaster thesis presentation
Master thesis presentationazubiaga
 

More from azubiaga (6)

Exploiting context for rumour detection in social media
Exploiting context for rumour detection in social mediaExploiting context for rumour detection in social media
Exploiting context for rumour detection in social media
 
Crowdsourcing the Annotation of Rumourous Conversations in Social Media
Crowdsourcing the Annotation of Rumourous Conversations in Social MediaCrowdsourcing the Annotation of Rumourous Conversations in Social Media
Crowdsourcing the Annotation of Rumourous Conversations in Social Media
 
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
 
Clasificación de Páginas Web con Anotaciones Sociales
Clasificación de Páginas Web con Anotaciones SocialesClasificación de Páginas Web con Anotaciones Sociales
Clasificación de Páginas Web con Anotaciones Sociales
 
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
 
Master thesis presentation
Master thesis presentationMaster thesis presentation
Master thesis presentation
 

Recently uploaded

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Getting the Most Out of Social Annotations for Web Page Classification

  • 1. Getting the Most Out of Social Annotations for Web Page Classification DocEng 2009 Arkaitz Zubiaga, Raquel Mart´ ınez, V´ ıctor Fresno NLP & IR Group @ UNED September 16th, 2009
  • 2. Introduction Index 1 Introduction 2 Dataset 3 Experiments 4 Conclusions 5 Future Work Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 2 / 25
  • 3. Introduction What is Web Page Classification? We have a set of documents: D = {d1 , ..., d|D| } And a set of predefined categories: C = {c1 , ..., c|C | } Web page classification is known as: dj , ci ∈ D × C Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 3 / 25
  • 4. Introduction What are Social Bookmarking Sites? (I) Web sites that allow us to save web links, defining metadata to them. Delicious1 1 http://delicious.com Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 4 / 25
  • 5. Introduction What are Social Bookmarking Sites? (II) Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 5 / 25
  • 6. Introduction Social Annotations Tags: Keywords. E.g., photography, web2.0, images. Notes: Free texts describing web pages. E.g., Flickr is a website for photo sharing and photo online management. Highlights: Selecting relevant parts of a page. Reviews: Free texts with subjective descriptions. E.g., Interesting web page with photos. Ratings: Gradings. E.g., 1 to 5. Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 6 / 25
  • 7. Introduction Motivation Classical web page classification methods rely on web pages’ content. Motivation: Could social annotations help improving the results? Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 7 / 25
  • 8. Introduction Related Work Some works (Bao et al., 2007; Heymann et al., 2008) show the usefulness of tags for information retrieval. (Ramage et al., 2009) show that tags can improved clustering tasks. (Noll and Meinell, 2008) make a study on tags, concluding that they could be interesting for web page classification tasks. Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 8 / 25
  • 9. Dataset Index 1 Introduction 2 Dataset 3 Experiments 4 Conclusions 5 Future Work Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 9 / 25
  • 10. Dataset Dataset December 2008 - January 2009: monitoring URLs with more than 100 users annotating it on Delicious’ recent feed. 87,096 URLs. Their classification on the Open Directory Project2 (ODP). 12,616 URLs matching. 17 first-level categories. Unbalanced. Annotations retrieval: Number of users annotating it3 . Top 10 list of tags3 . Full Tag Activity (FTA)3 . Notes3 . Reviews4 . Highlights5 . 2 http://www.dmoz.org 3 Delicious 4 StumbleUpon - http://www.stumbleupon.com 5 Diigo - http://diigo.com Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 10 / 25
  • 11. Experiments Index 1 Introduction 2 Dataset 3 Experiments 4 Conclusions 5 Future Work Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 11 / 25
  • 12. Experiments Configuration Support Vector Machines (SVM). SVMmulticlass6 Evaluation: Accuracy. Several training sets. 6 executions for each set. 6 http://svmlight.joachims.org Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 12 / 25
  • 13. Experiments Classifying with Tags (I) Unweighted tags. Ranked tags. Tag fractions. Weighted tags (Top 10). Weighted tags (FTA). Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 13 / 25
  • 14. Experiments Classyfing with tags (II) Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 14 / 25
  • 15. Experiments Classifying with Comments (I) Only notes. Both notes and reviews. Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 15 / 25
  • 16. Experiments Classifying with Comments (II) Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 16 / 25
  • 17. Experiments Comparison with the Baseline (Content) (I) Content. Comments. Tags. Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 17 / 25
  • 18. Experiments Comparison with the Baseline (Content) (II) Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 18 / 25
  • 19. Experiments Combining Classifiers (I) Tags + content. Tags + comments. Comment + content. Tags + comments + content. Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 19 / 25
  • 20. Experiments Combining Classifiers (II) Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 20 / 25
  • 21. Conclusions Index 1 Introduction 2 Dataset 3 Experiments 4 Conclusions 5 Future Work Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 21 / 25
  • 22. Conclusions Conclusions We analyzed and evaluated the use of social annotations for web page classification. Some of the annotations are not popular enough. Tags and comments are popular. Both tags and comments outperform the results by the content. Combining the 3 data inputs performs even better. We corroborate the conclusions by (Noll and Meinell, 2008), showing in a quantitative way that social annotations are useful for web page classification. Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 22 / 25
  • 23. Future Work Index 1 Introduction 2 Dataset 3 Experiments 4 Conclusions 5 Future Work Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 23 / 25
  • 24. Future Work Future Work Classifying in a lower level. Filtering tags and comments (misbehavior detection). Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 24 / 25
  • 25. Future Work Thank You Achiu Arigato Danke Dhannvaad Dua Netjer en ek Efcharisto Gracias Gr`cies a Gratia Grazie Guishepeli Hvala Kiitos K¨sz¨n¨m Merc´ Merci Mila o o o e esker Obrigado Shukran Tack Tak Takk Shukriya T¨nan Tapadh leat Tesekk¨r ederim Thank a u you Toda Zubiaga, Mart´ ınez, Fresno (UNED) Social Annotations for WPC September 16th, 2009 25 / 25