SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
Content-Driven
Author Reputation and Text Trust
          for the Wikipedia
              Luca de Alfaro
               UC Santa Cruz

               Joint work with
  Bo Adler, Ian Pye, Caitlin Sadowski (UCSC)


                            Wikimania, August 2007
Author Reputation and Text Trust

Author Reputation:
• Goal: Encourage authors to provide lasting contributions.
Author Reputation and Text Trust

Author Reputation:
• Goal: Encourage authors to provide lasting contributions.
Text Trust:
• Goal: provide a measure of the reliability of the text.
• Method: computed from the reputation of the authors
  who create and revise the text.
Reputation: Our guiding principles

• Do not alter the Wikipedia user experience
  – Compute reputation from content evolution, rather
    than user-to-user comments.
• Be welcoming to all users
  – Never publicly display user reputation values.
    Authors know only their own reputation.
• Be objective
  – Rely on content evolution rather than comments.
  – Quantitatively evaluate how well it works.
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                 A Wikipedia article
time
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                 A Wikipedia article

                        A
                             edits
time
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                 A Wikipedia article

                        A
                             edits
time




                        B
                             builds on A’s edit
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                 A Wikipedia article

                        A
                             edits
                   +
time




                        B
                             builds on A’s edit
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                  A Wikipedia article

                         A
                              edits
                   +
time




              +          B
                              builds on A’s edit
                   -
                         C    reverts to A’s version
Content-driven reputation mitigates
reputation wars
                                      -2
Wars in user-driven reputation:   A        B
Content-driven reputation mitigates
reputation wars
                                      -2
Wars in user-driven reputation:   A        B
                                      -3
Content-driven reputation mitigates
reputation wars
                                             -2
Wars in user-driven reputation:      A                 B
                                             -3

Wars in content-driven reputation:

                   • B can badmouth A by undoing
                      her work
         A
             -     • But this is risky: if others then
         B            re-instate A’s work, it is B’s
                      reputation that suffers.
Content-driven reputation mitigates
reputation wars
                                              -2
Wars in user-driven reputation:      A                  B
                                              -3

Wars in content-driven reputation:

                     • B can badmouth A by undoing
                       her work
         A
             -       • But this is risky: if others then
                 +
         B             re-instate A’s work, it is B’s
           -           reputation that suffers.
          others?
Validation: Does our reputation have
predictive value?
              Time
Article 1
Article 2
Article 3
Article 4
  ...


              = edits by user A
Validation: Does our reputation have
predictive value?
                       Time
Article 1
Article 2
                                   E
Article 3
Article 4
  ...


                                       The longevity of an edit E
   The reputation of author A
                                        depends on the history
at the time of an edit E depends
                                            after the edit.
 on the history before the edit.


              Can we show a correlation between
            author reputation and edit longevity ?
Building a content-driven
             reputation system
                for Wikipedia



This is a summary; for details see:
B.T. Adler, L. de Alfaro. A Content Driven Reputation
System for the Wikipedia. In Proc. of WWW 2007.
What is a “contribution”?

     Text                        Edit

     bla   ei        bla   yak            bla bla


                                        buy viagra!
    bla    yak ei   yak    bla

                                          bla bla
We measure how
long the added
                    We measure how long the “edit”
text survives.
                    (reorganization) survives.
Based on text
                    Based on edit distance.
tracking.
Text


version 9      bla   bla wuga boink
                5     8   9    6

 version 10    bla   bla   wuga   wuga   wuga boink
                5     8     10     10     9    6


 We label each word with the version where it was
 introduced. This enables us to keep track of how
 long it lives.
Text: the destiny of a contribution


      of words
      number



                                       time
                                    (versions)

    Amount of        Amount of
     new text      surviving text



    The life of the text introduced at a revision.
Text: Longevity

            Tk
                                                 j-k
                                   Tk ¢ α text
          of words
          number



                                                  time
                               j
                     k                     (versions)

• Text longevity: the α text 2 [0,1] that yields the best
  geometrical approximation for the amount of residual
  text.
• Short-lived text: α text < 0.2 (at most 20% of the text
  makes it from one version to the next).
Text: Reputation update

          Tk
                                 Tj
        of words
        number



                                         time
                            j
                   k                  (versions)
                   Ak       Aj        (authors)


As a consequence of edit j, we increase the reputation
  of Ak by an amount proportional to Tj and to the
  reputation of Aj
Measuring surviving text

                                          “Dead” text
                “Live” text

Version
         wuga     boing bla ble     stored as “dead”
 9
          7        9     6     6


                                           wuga   boing bla ble
          buy    viagra now!
 10
                                            7      9     6    6
          10       10    10           t
                                   es h
                                   bc
                                     at
                                   m
         wuga     boing bla ble
 11
          7        9     6     6
      We track authorship of deleted text, and we match the text of
      new versions both with live and with dead text.
Edit

 k-1 d(k
        -   1,
                 k)

                      k judged
  d(k




                      d(k, j)
     -1,




                                 k<j
    j)




                 j     judge




We compute the edit distance between versions k-1, k, and
 j, with k < j
                                       (see paper for details on the distance)
Edit: good or bad?

        the past
                                                     k judged
  k-1
                                 the past
                                      k-1
                 k judged




                                                    , j)
  d(k




                                    d(k
                 d(k, j)




                                                 d(k
     -1,




                                       -1,
        j)




                                       j)
                                             j
             j    judge                          judge

     the future                       the future


k is good: d(k-1, j) > d(k, j)     k is bad: d(k-1, j) < d(k, j)
 “k went towards the future”        “k went against the future”
Edit: Longevity
 the past
                                                   Edit Longevity:
                        “w
                          ork
       k-1
                        d(k done
                           -1     ”
                              ,k)
  d(k
“ pr
     -1,j




                                    k
 ogr
          )-d
              (k,j
              es s




                                        The fraction of change that is in
                   )
                    ”




                                          the same direction of the future.
                                            α edit ' 1: k is a good edit
                                        •
                                j           α edit ' -1: k is reverted
                                        •
                  the future
Edit: Updating reputation
 the past
                                         Edit Longevity:
                        “w
                          ork
       k-1
                        d(k done
                           -1     ”
                              ,k)
  d(k
“ pr
     -1,j




                                  kA
 ogr
          )-d
              (k,j
              es s




                                     k
                                         Reputation update:
                   )
                    ”




                                         The reputation of Ak

                                         • increases if α edit > 0,
                                j Aj
                                         • decreases if α edit < 0.
                  the future

                                          (see paper for details)
Data Sets

• English till Feb 07 1,988,627 pages, 40,455,416 versions

• French till Feb 07    452,577 pages,   5,643,636 versions

• Italian till May 07   301,584 pages,   3,129,453 versions



The entire Wikipedias, with the whole history, not just a
  sample (we wanted to compute the reputation using all edits
  of each user).
Results: English Wikipedia, in detail

                                      % of edits below a given longevity

                       Bin   %_data    l<0.8   l<0.4   l<0.0   l<-0.4      l<-0.8
                         0   16.922    93.11   91.65   89.15   83.76       73.53
                         1    1.191    77.24   69.83   65.60   61.11       56.00
                         2    1.335    69.53   57.08   49.79   45.71       41.25
log (1 + reputation)




                         3    1.627    38.00   28.61   20.23   16.16       13.62
                         4    2.780    32.84   22.31   13.32    9.57        8.04
                         5    4.408    41.70   15.76    5.90    3.80        2.57
                         6    6.698    29.40   16.74    7.54    4.35        3.12
                         7    8.281    32.04   15.16    5.44    2.25        1.40
                         8   12.233    34.06   16.64    6.78    3.79        2.73
                         9   44.524    32.55   15.51    5.05    1.88        1.14
Results: English Wikipedia, in detail

                                       % of edits below a given longevity

                        Bin   %_data    l<0.8   l<0.4   l<0.0   l<-0.4      l<-0.8
low                       0   16.922    93.11   91.65   89.15   83.76       73.53
rep                       1    1.191    77.24   69.83   65.60   61.11       56.00
                          2    1.335    69.53   57.08   49.79   45.71       41.25
 log (1 + reputation)




                          3    1.627    38.00   28.61   20.23   16.16       13.62
                          4    2.780    32.84   22.31   13.32    9.57        8.04
                          5    4.408    41.70   15.76    5.90    3.80        2.57
                          6    6.698    29.40   16.74    7.54    4.35        3.12
                          7    8.281    32.04   15.16    5.44    2.25        1.40
                          8   12.233    34.06   16.64    6.78    3.79        2.73
                          9   44.524    32.55   15.51    5.05    1.88        1.14

                                                                  Short-Lived
Predictive power of low reputation

Low-reputation:      Lower 20% of range


            Short-lived edits         Short-lived text
                                          α text
                  α edit                           · 0.2
                           · -0.8
                                         (less than 20%
           (almost entirely undone)
                                      survives each revision)
Text trust

      Yadda yadda wuga wuga bla bla bla bing bong
                                A


Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong




  Old text is colored according       New text is colored
  to the reputation of its original   according to the
  author, and of all subsequent       reputation of A
  revisors (including A).
Text trust

      Yadda yadda wuga wuga bla bla bla bing bong
                           A


Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong


• On the English Wikipedia, we should be able to spot
  untrusted content with over 80% recall and 60%
  precision!
   – In fact, we do even better than this, as new content
     is always flagged lower trust (see next).
Demo: http://trust.cse.ucsc.edu/
Text trust: How is “Fogh” spelled?
Text Trust: more examples from the demo
Text Trust: Details

Trust depends on:
• Authorship: Author lends 50% of their reputation to
  the text they create.
   – Thus, even text from high-rep authors is only medium-
     rep when added: high trust is achieved only via multiple
     reviews, never via a single author.
• Revision: When an author of reputation r preserves a
  word of trust t < r, the word increases in trust to
                          t + 0.3(r – t)
• The algorithms still need fine-tuning.
From fresh to trusted text
From fresh to trusted text
From fresh to trusted text
From fresh to trusted text
From fresh to trusted text
Batch Implementation


                      periodic xml dumps
                          (to initialize)

                           edit feed
                       (to keep updated)
                                              Trust server
Wikipedia servers

• No need to affect the main Wikipedia servers
• People can click “check trust” and visit the trust server.
• Good for experimenting with new ideas
• Necessary to color the past (come up to speed).
On-Line Implementation

Process edits as they arrive:
• Benefit: real-time colorization of text
• Need to integrate the code in MediaWiki
• Time to process an edit: < 1s (not much longer than
  parsing it).
• Storage required: proportional to the size of the last
  revision (not to the total history size!)
• Can be easily used for other Wikis
My questions:
• Feedback?
• Do you like it?
• Should we try to set up a “trust server” with
  an edit feed from the Wikipedia?
• Try the demo:

       http://trust.cse.ucsc.edu/

Your questions?

Más contenido relacionado

Destacado

E-reputation : contexte, outils, stratégie et contenus
E-reputation : contexte, outils, stratégie et contenusE-reputation : contexte, outils, stratégie et contenus
E-reputation : contexte, outils, stratégie et contenusRégis Vansnick
 
Communiquer sur les réseaux sociaux
Communiquer sur les réseaux sociauxCommuniquer sur les réseaux sociaux
Communiquer sur les réseaux sociauxQuinchy Riya
 
Les clés de l'E-Reputation en 2014 [HUB Report]
Les clés de l'E-Reputation en 2014 [HUB Report]Les clés de l'E-Reputation en 2014 [HUB Report]
Les clés de l'E-Reputation en 2014 [HUB Report]HUB INSTITUTE
 
Social Networking Ppt
Social Networking PptSocial Networking Ppt
Social Networking Pptkmlaughl
 
Reputation Management and Social Media
Reputation Management and Social MediaReputation Management and Social Media
Reputation Management and Social MediaPaul Marsden
 
Accessibilité et Réseaux Sociaux
Accessibilité et Réseaux Sociaux Accessibilité et Réseaux Sociaux
Accessibilité et Réseaux Sociaux Hicham Sabre
 
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoringE-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoringHUB INSTITUTE
 
Datagency iab capsules_20150211_versiontechdays_vf
Datagency iab capsules_20150211_versiontechdays_vfDatagency iab capsules_20150211_versiontechdays_vf
Datagency iab capsules_20150211_versiontechdays_vfEmarketing.fr
 
Social networking PPT
Social networking PPTSocial networking PPT
Social networking PPTvarun0912
 
Reputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender SystemsReputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender SystemsAhmad Jawdat
 
E-reputation : le livre blanc
E-reputation : le livre blancE-reputation : le livre blanc
E-reputation : le livre blancAref Jdey
 
E-réputation et marques : état de l'art et enjeux - by Vanksen
E-réputation et marques : état de l'art et enjeux - by VanksenE-réputation et marques : état de l'art et enjeux - by Vanksen
E-réputation et marques : état de l'art et enjeux - by VanksenVanksen
 

Destacado (13)

E-reputation : contexte, outils, stratégie et contenus
E-reputation : contexte, outils, stratégie et contenusE-reputation : contexte, outils, stratégie et contenus
E-reputation : contexte, outils, stratégie et contenus
 
Communiquer sur les réseaux sociaux
Communiquer sur les réseaux sociauxCommuniquer sur les réseaux sociaux
Communiquer sur les réseaux sociaux
 
Les clés de l'E-Reputation en 2014 [HUB Report]
Les clés de l'E-Reputation en 2014 [HUB Report]Les clés de l'E-Reputation en 2014 [HUB Report]
Les clés de l'E-Reputation en 2014 [HUB Report]
 
Online Reputation Management
Online Reputation ManagementOnline Reputation Management
Online Reputation Management
 
Social Networking Ppt
Social Networking PptSocial Networking Ppt
Social Networking Ppt
 
Reputation Management and Social Media
Reputation Management and Social MediaReputation Management and Social Media
Reputation Management and Social Media
 
Accessibilité et Réseaux Sociaux
Accessibilité et Réseaux Sociaux Accessibilité et Réseaux Sociaux
Accessibilité et Réseaux Sociaux
 
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoringE-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
 
Datagency iab capsules_20150211_versiontechdays_vf
Datagency iab capsules_20150211_versiontechdays_vfDatagency iab capsules_20150211_versiontechdays_vf
Datagency iab capsules_20150211_versiontechdays_vf
 
Social networking PPT
Social networking PPTSocial networking PPT
Social networking PPT
 
Reputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender SystemsReputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender Systems
 
E-reputation : le livre blanc
E-reputation : le livre blancE-reputation : le livre blanc
E-reputation : le livre blanc
 
E-réputation et marques : état de l'art et enjeux - by Vanksen
E-réputation et marques : état de l'art et enjeux - by VanksenE-réputation et marques : état de l'art et enjeux - by Vanksen
E-réputation et marques : état de l'art et enjeux - by Vanksen
 

Más de nextlib

Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Archnextlib
 
D Rb Silicon Valley Ruby Conference
D Rb   Silicon Valley Ruby ConferenceD Rb   Silicon Valley Ruby Conference
D Rb Silicon Valley Ruby Conferencenextlib
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architecturesnextlib
 
Aldous Huxley Brave New World
Aldous Huxley Brave New WorldAldous Huxley Brave New World
Aldous Huxley Brave New Worldnextlib
 
Social Graph
Social GraphSocial Graph
Social Graphnextlib
 
Ajax Prediction
Ajax PredictionAjax Prediction
Ajax Predictionnextlib
 
Closures for Java
Closures for JavaClosures for Java
Closures for Javanextlib
 
SVD review
SVD reviewSVD review
SVD reviewnextlib
 
Mongrel Handlers
Mongrel HandlersMongrel Handlers
Mongrel Handlersnextlib
 
Blue Ocean Strategy
Blue Ocean StrategyBlue Ocean Strategy
Blue Ocean Strategynextlib
 
日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學nextlib
 
Comparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering SystemsComparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering Systemsnextlib
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithmsnextlib
 
Agile Adoption2007
Agile Adoption2007Agile Adoption2007
Agile Adoption2007nextlib
 
Modern Compiler Design
Modern Compiler DesignModern Compiler Design
Modern Compiler Designnextlib
 
透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲nextlib
 
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...nextlib
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Bigtable
BigtableBigtable
Bigtablenextlib
 

Más de nextlib (20)

Nio
NioNio
Nio
 
Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
 
D Rb Silicon Valley Ruby Conference
D Rb   Silicon Valley Ruby ConferenceD Rb   Silicon Valley Ruby Conference
D Rb Silicon Valley Ruby Conference
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architectures
 
Aldous Huxley Brave New World
Aldous Huxley Brave New WorldAldous Huxley Brave New World
Aldous Huxley Brave New World
 
Social Graph
Social GraphSocial Graph
Social Graph
 
Ajax Prediction
Ajax PredictionAjax Prediction
Ajax Prediction
 
Closures for Java
Closures for JavaClosures for Java
Closures for Java
 
SVD review
SVD reviewSVD review
SVD review
 
Mongrel Handlers
Mongrel HandlersMongrel Handlers
Mongrel Handlers
 
Blue Ocean Strategy
Blue Ocean StrategyBlue Ocean Strategy
Blue Ocean Strategy
 
日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學
 
Comparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering SystemsComparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering Systems
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
Agile Adoption2007
Agile Adoption2007Agile Adoption2007
Agile Adoption2007
 
Modern Compiler Design
Modern Compiler DesignModern Compiler Design
Modern Compiler Design
 
透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲
 
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Bigtable
BigtableBigtable
Bigtable
 

Último

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Último (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

UC Santa Cruz Author Provides Content-Driven Reputation Model for Wikipedia

  • 1. Content-Driven Author Reputation and Text Trust for the Wikipedia Luca de Alfaro UC Santa Cruz Joint work with Bo Adler, Ian Pye, Caitlin Sadowski (UCSC) Wikimania, August 2007
  • 2. Author Reputation and Text Trust Author Reputation: • Goal: Encourage authors to provide lasting contributions.
  • 3. Author Reputation and Text Trust Author Reputation: • Goal: Encourage authors to provide lasting contributions. Text Trust: • Goal: provide a measure of the reliability of the text. • Method: computed from the reputation of the authors who create and revise the text.
  • 4. Reputation: Our guiding principles • Do not alter the Wikipedia user experience – Compute reputation from content evolution, rather than user-to-user comments. • Be welcoming to all users – Never publicly display user reputation values. Authors know only their own reputation. • Be objective – Rely on content evolution rather than comments. – Quantitatively evaluate how well it works.
  • 5. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article time
  • 6. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits time
  • 7. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits time B builds on A’s edit
  • 8. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits + time B builds on A’s edit
  • 9. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits + time + B builds on A’s edit - C reverts to A’s version
  • 10. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B
  • 11. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B -3
  • 12. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B -3 Wars in content-driven reputation: • B can badmouth A by undoing her work A - • But this is risky: if others then B re-instate A’s work, it is B’s reputation that suffers.
  • 13. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B -3 Wars in content-driven reputation: • B can badmouth A by undoing her work A - • But this is risky: if others then + B re-instate A’s work, it is B’s - reputation that suffers. others?
  • 14. Validation: Does our reputation have predictive value? Time Article 1 Article 2 Article 3 Article 4 ... = edits by user A
  • 15. Validation: Does our reputation have predictive value? Time Article 1 Article 2 E Article 3 Article 4 ... The longevity of an edit E The reputation of author A depends on the history at the time of an edit E depends after the edit. on the history before the edit. Can we show a correlation between author reputation and edit longevity ?
  • 16. Building a content-driven reputation system for Wikipedia This is a summary; for details see: B.T. Adler, L. de Alfaro. A Content Driven Reputation System for the Wikipedia. In Proc. of WWW 2007.
  • 17. What is a “contribution”? Text Edit bla ei bla yak bla bla buy viagra! bla yak ei yak bla bla bla We measure how long the added We measure how long the “edit” text survives. (reorganization) survives. Based on text Based on edit distance. tracking.
  • 18. Text version 9 bla bla wuga boink 5 8 9 6 version 10 bla bla wuga wuga wuga boink 5 8 10 10 9 6 We label each word with the version where it was introduced. This enables us to keep track of how long it lives.
  • 19. Text: the destiny of a contribution of words number time (versions) Amount of Amount of new text surviving text The life of the text introduced at a revision.
  • 20. Text: Longevity Tk j-k Tk ¢ α text of words number time j k (versions) • Text longevity: the α text 2 [0,1] that yields the best geometrical approximation for the amount of residual text. • Short-lived text: α text < 0.2 (at most 20% of the text makes it from one version to the next).
  • 21. Text: Reputation update Tk Tj of words number time j k (versions) Ak Aj (authors) As a consequence of edit j, we increase the reputation of Ak by an amount proportional to Tj and to the reputation of Aj
  • 22. Measuring surviving text “Dead” text “Live” text Version wuga boing bla ble stored as “dead” 9 7 9 6 6 wuga boing bla ble buy viagra now! 10 7 9 6 6 10 10 10 t es h bc at m wuga boing bla ble 11 7 9 6 6 We track authorship of deleted text, and we match the text of new versions both with live and with dead text.
  • 23. Edit k-1 d(k - 1, k) k judged d(k d(k, j) -1, k<j j) j judge We compute the edit distance between versions k-1, k, and j, with k < j (see paper for details on the distance)
  • 24. Edit: good or bad? the past k judged k-1 the past k-1 k judged , j) d(k d(k d(k, j) d(k -1, -1, j) j) j j judge judge the future the future k is good: d(k-1, j) > d(k, j) k is bad: d(k-1, j) < d(k, j) “k went towards the future” “k went against the future”
  • 25. Edit: Longevity the past Edit Longevity: “w ork k-1 d(k done -1 ” ,k) d(k “ pr -1,j k ogr )-d (k,j es s The fraction of change that is in ) ” the same direction of the future. α edit ' 1: k is a good edit • j α edit ' -1: k is reverted • the future
  • 26. Edit: Updating reputation the past Edit Longevity: “w ork k-1 d(k done -1 ” ,k) d(k “ pr -1,j kA ogr )-d (k,j es s k Reputation update: ) ” The reputation of Ak • increases if α edit > 0, j Aj • decreases if α edit < 0. the future (see paper for details)
  • 27. Data Sets • English till Feb 07 1,988,627 pages, 40,455,416 versions • French till Feb 07 452,577 pages, 5,643,636 versions • Italian till May 07 301,584 pages, 3,129,453 versions The entire Wikipedias, with the whole history, not just a sample (we wanted to compute the reputation using all edits of each user).
  • 28. Results: English Wikipedia, in detail % of edits below a given longevity Bin %_data l<0.8 l<0.4 l<0.0 l<-0.4 l<-0.8 0 16.922 93.11 91.65 89.15 83.76 73.53 1 1.191 77.24 69.83 65.60 61.11 56.00 2 1.335 69.53 57.08 49.79 45.71 41.25 log (1 + reputation) 3 1.627 38.00 28.61 20.23 16.16 13.62 4 2.780 32.84 22.31 13.32 9.57 8.04 5 4.408 41.70 15.76 5.90 3.80 2.57 6 6.698 29.40 16.74 7.54 4.35 3.12 7 8.281 32.04 15.16 5.44 2.25 1.40 8 12.233 34.06 16.64 6.78 3.79 2.73 9 44.524 32.55 15.51 5.05 1.88 1.14
  • 29. Results: English Wikipedia, in detail % of edits below a given longevity Bin %_data l<0.8 l<0.4 l<0.0 l<-0.4 l<-0.8 low 0 16.922 93.11 91.65 89.15 83.76 73.53 rep 1 1.191 77.24 69.83 65.60 61.11 56.00 2 1.335 69.53 57.08 49.79 45.71 41.25 log (1 + reputation) 3 1.627 38.00 28.61 20.23 16.16 13.62 4 2.780 32.84 22.31 13.32 9.57 8.04 5 4.408 41.70 15.76 5.90 3.80 2.57 6 6.698 29.40 16.74 7.54 4.35 3.12 7 8.281 32.04 15.16 5.44 2.25 1.40 8 12.233 34.06 16.64 6.78 3.79 2.73 9 44.524 32.55 15.51 5.05 1.88 1.14 Short-Lived
  • 30. Predictive power of low reputation Low-reputation: Lower 20% of range Short-lived edits Short-lived text α text α edit · 0.2 · -0.8 (less than 20% (almost entirely undone) survives each revision)
  • 31. Text trust Yadda yadda wuga wuga bla bla bla bing bong A Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong Old text is colored according New text is colored to the reputation of its original according to the author, and of all subsequent reputation of A revisors (including A).
  • 32. Text trust Yadda yadda wuga wuga bla bla bla bing bong A Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong • On the English Wikipedia, we should be able to spot untrusted content with over 80% recall and 60% precision! – In fact, we do even better than this, as new content is always flagged lower trust (see next).
  • 34. Text trust: How is “Fogh” spelled?
  • 35. Text Trust: more examples from the demo
  • 36. Text Trust: Details Trust depends on: • Authorship: Author lends 50% of their reputation to the text they create. – Thus, even text from high-rep authors is only medium- rep when added: high trust is achieved only via multiple reviews, never via a single author. • Revision: When an author of reputation r preserves a word of trust t < r, the word increases in trust to t + 0.3(r – t) • The algorithms still need fine-tuning.
  • 37. From fresh to trusted text
  • 38. From fresh to trusted text
  • 39. From fresh to trusted text
  • 40. From fresh to trusted text
  • 41. From fresh to trusted text
  • 42. Batch Implementation periodic xml dumps (to initialize) edit feed (to keep updated) Trust server Wikipedia servers • No need to affect the main Wikipedia servers • People can click “check trust” and visit the trust server. • Good for experimenting with new ideas • Necessary to color the past (come up to speed).
  • 43. On-Line Implementation Process edits as they arrive: • Benefit: real-time colorization of text • Need to integrate the code in MediaWiki • Time to process an edit: < 1s (not much longer than parsing it). • Storage required: proportional to the size of the last revision (not to the total history size!) • Can be easily used for other Wikis
  • 44. My questions: • Feedback? • Do you like it? • Should we try to set up a “trust server” with an edit feed from the Wikipedia? • Try the demo: http://trust.cse.ucsc.edu/ Your questions?