SlideShare a Scribd company logo
1 of 23
STiki: An Anti-Vandalism Tool for Wikipedia using Spatio-Temporal Analysis of Revision Metadata A.G. West, S. Kannan, and I. Lee Wikimania `10 – July 10, 2010
STiki = Huggle++ 2 STiki = Huggle, but: CENTRALIZED: STiki is always scoring edits, in bot-like fashion. QUEUING: STiki uses 15+ ML-features to set presentation order (not a static rule set) ,[object Object],.
Outline/Summary 3 Vandalism detection methodology [6]  Wikipedia revision metadata (not the article or diff text) can be used to detect vandalism ML over simple features and aggregate reputation values for articles, editors, spatial groups thereof The STiki software tool Straightforward application of above technique Demonstration of the tool and functionality Alternative uses for the open-source code
Metadata 4 Wikipedia provides metadata via dumps/API:
5 Labeling Vandalism ROLLBACK is used to label edits as vandalism: Only true-rollback, no software-based ones Edit summaries used to locate (Native, Huggle, Twinkle, ClueBot) Bad ones= {OFF. EDITS}, others = {UNLABELED} Why rollback? Automated (v. manual) High-confidence Per case (vs. definition) Why do edits need labels?: ,[object Object]
 (2) Building block of reputation buildingPrevalence/Source of Rollbacks
Simple Features ,[object Object]
Spatial props: Appropriate wherever a size, distance, or membership function can be definedSIMPLE FEATURES * Discussion abbreviated to concentrate on aggregate ones 6
Edit Time, Day-of-Week 7 Use IP-geo-location  data to determine origin time-zone, adjust UTC timestamp Vandalism most prevalent during working hours/week: Kids are in school(?) Fun fact: Vandalism almost twice as prevalent on a Tuesday versus a Sunday Unlabeled Local time-of-day when edits made UnLbl Local day-of-week when edits made
Time-Since (TS)… 8 High-edit pages most often vandalized ≈2% of pages have 5+ OEs, yet these pages have 52% of all edits Other work [3] has shown these are also articles most visited ,[object Object]
“Registration”: time-stamp of first edit made by user
Sybil-attack to abuse benefits?,[object Object]
Huge contributors, but rarely vandalize,[object Object]
PreSTA Algorithm PreSTA [5]: Model for ST-rep: CORE IDEA: No entity specific data? Examine spatially-adjacent entities (homophily)  Rep(group) =   Σ time_decay (TSvandalism) size(group) Timestamps (TS) of vandalism incidents by group members A Alice Polish Europeans Grouping functions (spatial) define memberships Observations of misbehavior form feedback – and observ-ations are decayed (temporal) rep(A) rep(POL) rep(EUR) Higher-Order Reputation 11
Article Reputation 12 Intuitively some topics are contro-versial and likely targets for vandalism(or temporally so). 85% of OEs have non-zero rep (just 45% of random) UnLbl CDF of Article Reputation Articles w/most OEs
Category Reputation 13 ,[object Object]
Wiki provides cats./memberships – use only topical.
97% of OEs have non-zero reputation (85% in article case)Categories with most OEs Category: President Reputation: Barack Obama Presidents Article: Abraham Lincoln G.W. Bush …… Lawyers MAXIMUM(?) Category: Lawyer Feat. Value …… …… Example of Category Rep. Calculation
Editor Reputation 14 Straightforward use of the rep() function, one- editor groups UnLbl UnLbl CDF of Editor Reputation ,[object Object]
Mediocre performance. Meaningful correlation with other features, however.,[object Object]
16 Off-line Performance ,[object Object]
Use as an intelligent routing (IR) toolRecall: % total OEs 	classified correctly Precision: % of edits classified OE that are vandalism 50% @ 50%
STiki 17 STiki [4]: A real-time, on-Wikipediaimplementation of the technique

More Related Content

Viewers also liked

Viewers also liked (13)

Shimon hameiri the art of holography and photography
Shimon hameiri   the art of holography and photographyShimon hameiri   the art of holography and photography
Shimon hameiri the art of holography and photography
 
Search Engine Optimization
Search Engine OptimizationSearch Engine Optimization
Search Engine Optimization
 
Security Testing For Web Applications
Security Testing For Web ApplicationsSecurity Testing For Web Applications
Security Testing For Web Applications
 
Load Runner
Load RunnerLoad Runner
Load Runner
 
vandalism- research work
vandalism- research workvandalism- research work
vandalism- research work
 
Rest Console
Rest ConsoleRest Console
Rest Console
 
Selenium
SeleniumSelenium
Selenium
 
Havij
HavijHavij
Havij
 
Automation Testing
Automation TestingAutomation Testing
Automation Testing
 
What Are The Advantages and Disadvantages Of Studying And Working Together?
What Are The Advantages and Disadvantages Of Studying And Working Together?What Are The Advantages and Disadvantages Of Studying And Working Together?
What Are The Advantages and Disadvantages Of Studying And Working Together?
 
Web Services Testing
Web Services TestingWeb Services Testing
Web Services Testing
 
Karahasan sa paaralan
Karahasan sa paaralanKarahasan sa paaralan
Karahasan sa paaralan
 
Children’s rights power
Children’s rights powerChildren’s rights power
Children’s rights power
 

Similar to STiki: An Anti-vandalism Tool for Wikipedia using the Spatio-temporal Properties of Revision Metadata

Stuxnet redux. malware attribution & lessons learned
Stuxnet redux. malware attribution & lessons learnedStuxnet redux. malware attribution & lessons learned
Stuxnet redux. malware attribution & lessons learnedYury Chemerkin
 
Software rotting - DevOpsCon Berlin
Software rotting - DevOpsCon BerlinSoftware rotting - DevOpsCon Berlin
Software rotting - DevOpsCon BerlinGiulio Vian
 
18 applied architectures_part_2
18 applied architectures_part_218 applied architectures_part_2
18 applied architectures_part_2Majong DevJfu
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)HPCC Systems
 
Nurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureNurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureTom Mens
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Tao Xie
 
L'impatto della sicurezza su DevOps
L'impatto della sicurezza su DevOpsL'impatto della sicurezza su DevOps
L'impatto della sicurezza su DevOpsGiulio Vian
 
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...Roberto Casadei
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaborationJulien Pivotto
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigmJonathan Challener
 
Thug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientThug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientAngelo Dell'Aera
 
OORPT Dynamic Analysis
OORPT Dynamic AnalysisOORPT Dynamic Analysis
OORPT Dynamic Analysislienhard
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software AnalyticsMargaret-Anne Storey
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)Tao Xie
 
Cytoscape CI Chapter 2
Cytoscape CI Chapter 2Cytoscape CI Chapter 2
Cytoscape CI Chapter 2bdemchak
 
Machine programming
Machine programmingMachine programming
Machine programmingDESMOND YUEN
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석datasciencekorea
 

Similar to STiki: An Anti-vandalism Tool for Wikipedia using the Spatio-temporal Properties of Revision Metadata (20)

Stuxnet redux. malware attribution & lessons learned
Stuxnet redux. malware attribution & lessons learnedStuxnet redux. malware attribution & lessons learned
Stuxnet redux. malware attribution & lessons learned
 
Software rotting - DevOpsCon Berlin
Software rotting - DevOpsCon BerlinSoftware rotting - DevOpsCon Berlin
Software rotting - DevOpsCon Berlin
 
18 applied architectures_part_2
18 applied architectures_part_218 applied architectures_part_2
18 applied architectures_part_2
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
 
Nurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureNurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the Future
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...
 
L'impatto della sicurezza su DevOps
L'impatto della sicurezza su DevOpsL'impatto della sicurezza su DevOps
L'impatto della sicurezza su DevOps
 
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
Towards Automated Engineering for Collective Adaptive Systems: Vision and Res...
 
Of Changes and Their History
Of Changes and Their HistoryOf Changes and Their History
Of Changes and Their History
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaboration
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigm
 
Thug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientThug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclient
 
OORPT Dynamic Analysis
OORPT Dynamic AnalysisOORPT Dynamic Analysis
OORPT Dynamic Analysis
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software Analytics
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
 
Cytoscape CI Chapter 2
Cytoscape CI Chapter 2Cytoscape CI Chapter 2
Cytoscape CI Chapter 2
 
Machine programming
Machine programmingMachine programming
Machine programming
 
Rakuten openstack
Rakuten openstackRakuten openstack
Rakuten openstack
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

STiki: An Anti-vandalism Tool for Wikipedia using the Spatio-temporal Properties of Revision Metadata

  • 1. STiki: An Anti-Vandalism Tool for Wikipedia using Spatio-Temporal Analysis of Revision Metadata A.G. West, S. Kannan, and I. Lee Wikimania `10 – July 10, 2010
  • 2.
  • 3. Outline/Summary 3 Vandalism detection methodology [6] Wikipedia revision metadata (not the article or diff text) can be used to detect vandalism ML over simple features and aggregate reputation values for articles, editors, spatial groups thereof The STiki software tool Straightforward application of above technique Demonstration of the tool and functionality Alternative uses for the open-source code
  • 4. Metadata 4 Wikipedia provides metadata via dumps/API:
  • 5.
  • 6. (2) Building block of reputation buildingPrevalence/Source of Rollbacks
  • 7.
  • 8. Spatial props: Appropriate wherever a size, distance, or membership function can be definedSIMPLE FEATURES * Discussion abbreviated to concentrate on aggregate ones 6
  • 9. Edit Time, Day-of-Week 7 Use IP-geo-location data to determine origin time-zone, adjust UTC timestamp Vandalism most prevalent during working hours/week: Kids are in school(?) Fun fact: Vandalism almost twice as prevalent on a Tuesday versus a Sunday Unlabeled Local time-of-day when edits made UnLbl Local day-of-week when edits made
  • 10.
  • 11. “Registration”: time-stamp of first edit made by user
  • 12.
  • 13.
  • 14. PreSTA Algorithm PreSTA [5]: Model for ST-rep: CORE IDEA: No entity specific data? Examine spatially-adjacent entities (homophily) Rep(group) = Σ time_decay (TSvandalism) size(group) Timestamps (TS) of vandalism incidents by group members A Alice Polish Europeans Grouping functions (spatial) define memberships Observations of misbehavior form feedback – and observ-ations are decayed (temporal) rep(A) rep(POL) rep(EUR) Higher-Order Reputation 11
  • 15. Article Reputation 12 Intuitively some topics are contro-versial and likely targets for vandalism(or temporally so). 85% of OEs have non-zero rep (just 45% of random) UnLbl CDF of Article Reputation Articles w/most OEs
  • 16.
  • 17. Wiki provides cats./memberships – use only topical.
  • 18. 97% of OEs have non-zero reputation (85% in article case)Categories with most OEs Category: President Reputation: Barack Obama Presidents Article: Abraham Lincoln G.W. Bush …… Lawyers MAXIMUM(?) Category: Lawyer Feat. Value …… …… Example of Category Rep. Calculation
  • 19.
  • 20.
  • 21.
  • 22. Use as an intelligent routing (IR) toolRecall: % total OEs classified correctly Precision: % of edits classified OE that are vandalism 50% @ 50%
  • 23. STiki 17 STiki [4]: A real-time, on-Wikipediaimplementation of the technique
  • 24.
  • 25. Popped: GUI client shows likely vandalism first
  • 26.
  • 27. STiki Performance 20 Competition inhibits maximal performance Metric: Hit-rate (% of edits displayed that are vandalism) Offline analysis shows it could be 50%+ Competing (often autonomous) tools make it ≈10% STiki successes and use-cases Has reverted over 5000+ instances of vandalism May be more appropriate in less patrolled installations Any of Wikipedia’s foreign language editions Embedded vandalism: That escaping initial detection. Median age of STiki revert is 4.25 hours, 200× RBs. Further, average STiki revert had 210 views during active duration.
  • 28. Alternative Uses 21 All code is available [4] and open source (Java) Backend (server-side) re-use Large portion of MediaWiki API implemented (bots) Trivial to add new features (including NLP ones) Frontend (client-side) re-use Useful whenever edits require human inspection Offline inspection tool for corpus building Data re-use Incorporate vandalism score into more robust tools Willing to provide data to other researchers
  • 29. Crowd-sourcing 22 Shared queue := Pending changes trial Abuse of “pass” by an edit hoarding user Do ‘reviewers’ need to be reviewed? Where does it stop? Multi-layer verification checks to find anomalies Could reviewer reputations also be created? Threshold for queue access? Registered? Auto-confirmed? Or more? Cache-22: Use vs. perceived success More users = more vandalism found. But deep in queue, vandalism unlikely = User abandonment.
  • 30. References 23 [1] S. Hao, N.A. Syed, N. Feamster, A.G. Gray, and S. Krasser. Detecting spammers with SNARE: Spatiotemporal network-level automated reputation engine. In 18th USENIX Security Symposium, 2009 [2] M. Potthast, B. Stein, and R. Gerling. Automatic vandalism detection in Wikipedia. In Advances in Information Retrieval, 2008. [3] R. Priedhorsky, J. Chen, S.K. Lam, K. Achier, L. Terveen, and J. Riedl. Creating, destroying, and restoring value in Wikipedia. In GROUP `07, 2007. [4] A.G. West. STiki: A vandalism detection tool for Wikipedia. http://en.wikipedia.org/wiki/Wikipedia:STiki. Software, 2010. [5] A.G. West, A.J. Aviv, J. Chang, and I. Lee. Mitigating spam using spatio-temporal reputation. Technical report UPENN-MS-CIS-10-04, Feb. 2010. [6] A.G. West, S. Kannan, and I. Lee. Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata. In EUROSEC `10, April 2010.