SlideShare a Scribd company logo
1 of 17
Collective Spammer Detection in Evolving Multi-Relational
Social Networks
(To appear at KDD’15)
Shobeir Fakhraei*
University of Maryland
*Research performed while on internship at if(we) (Tagged Inc.) through Graphlab program
In collaboration with:
James Foulds (UCSC, Currently UCSD)
Madhusudana Shashanka (if(we) Inc., Currently Niara Inc.)
Lise Getoor (UCSC)
Spam in Social Networks
• Recent study by Nexgate in 2013:
• Spam grew by more than 300% in half a year
• 1 in 200 of social messages are spam
• 5% of all social apps are spammy
• Social networks:
• Spammers have more ways to interact with users
e.g, Comments on photos, send message, and send not verbal signals
• They can split content in several messages and forms
• More available info about users on their profiles!
2
Our Approach
• Lets look at the graph:
• Multi-relational social
network
Links = actions at time t
(e.g. profile view, message,
or poke)
• Time-stamped Links
3
t(1)
t(3)
t(2)
t(4)
t(5)
t(6)
t(7)t(8)
t(9)
t(10)
t(11)
Tagged.com
• Tagged.com is a social network founded in 2004
which focuses on connecting people through different
social features and games.
• It has over 300 million registered members
• Data sample for experiments (on a laptop):
• 5.6 Million users (%3.9 Labeled Spammers)
• 912 Million Links
4
Graph Structure
• Using Graphlab Create
• Graph Analytics Toolkit to Generate Features:
• In each relation graph we compute:
PageRank, Total Degree, In Degree, Out Degree, k-Core, Graph Coloring,
Connected Components, Triangle Count
• Gradient Boosted Trees:
• Classification (Spam or Legitimate)
5
Are you interested?
Meet Me Play Pets Message Wink Report AbuseFriend Request
Pagerank,
K-core,
Graph coloring,
Triangle count,
Connected components,
In/out degree
Graph Structure
6
Experiments AU-PR AU-ROC
1 Relation,
k Feature type
0.187 ± 0.004 0.803 ±0.001
n Relations,
1 Feature Type
0.285 ± 0.002 0.809 ± 0.001
n Relations,
k Features types
0.328 ± 0.003 0.817 ± 0.001
Sequence of Actions
• Sequential k-gram Features:
Short sequence segment of k consecutive actions,
to capture the order of events
7
User1 Actions:
Message, Profile_view, Message, Friend_Request, ….
Sequence of Actions
• Mixture of Markov Models (MMM):
Also called chain-augmented or tree-augmented
naive Bayes model to capture longer sequences
8
Sequence of Actions
9
Experiments AU-PR AU-ROC
Bigram Features 0.471 ± 0.004 0.859 ± 0.001
MMM 0.246 ± 0.009 0.821 ± 0.003
Results
10
Precision-Recall ROC
We can classify 70% of the spammer that need manual labeling with about 90% accuracy
Deployment and Example Runtimes
• We can:
• Run the model on short intervals, with new snapshots of the
network
• Update some of the features with events.
• Example runtimes with Graphlab CreateTM on a Macbook Pro:
• 5.6 million nodes and 350 million links
• PageRank: 6.25 minutes
• Triangle counting: 17.98 minutes
• k-core: 14.3 minutes
11
Refining the abuse reporting systems
• Abuse report systems are very noisy.
• People have different standards.
• Spammers report random people to increase noise.
• Personal gain in social games.
• Goal is to clean up the system using
• Reporters previous history
• Looking at all the reports collectively
12
Probabilistic Soft Logic (PSL)
• Probabilistic Soft Logic (PSL) Declarative language
based on first-order logic to express collective
probabilistic inference problems.
• Example:
13
Results of Classification Using Reports
14
Experiments AU-PR AU-ROC
Reports Only 0.674 ± 0.008 0.611 ± 0.007
Reports & Credibility 0.869 ± 0.006 0.862 ± 0.004
Reports & Credibility
& Collective Reasoning
0.884 ± 0.005 0.873 ± 0.004
Conclusions
• Strong signals in multi-relational time-stamped graph
• Graph Structure:
• Multi-rationality is more important than extracting more features from one relations
• Sequence:
• Bigrams features of actions capture most of the patterns
• Reports:
• Incorporating credibility of reporting users and collective reasoning significantly improves the
signal
• Fast experiments iterations with Graphlab CreateTM facilitates finding these
pattern
15
Acknowledgements
• Co-authors of the paper:
James Foulds (UCSC)
Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)
Lise Getoor (UCSC)
• If(we) Inc. (Formerly Tagged Inc.):
Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon Hill
• Dato (Formerly Graphlab):
Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng
• The paper is available online, and code and part of the
data will be released soon:
http://www.cs.umd.edu/~shobeir
16
Acknowledgements
• Co-authors of the paper:
James Foulds (UCSC)
Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)
Lise Getoor (UCSC)
• If(we) Inc. (Formerly Tagged Inc.):
Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon Hill
• Dato (Formerly Graphlab):
Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng
• The paper is available online, and code and part of the
data will be released soon:
http://www.cs.umd.edu/~shobeir
17

More Related Content

Viewers also liked

Journal of Nutritional Health and Food Engineering-02-00046
Journal of Nutritional Health and Food Engineering-02-00046Journal of Nutritional Health and Food Engineering-02-00046
Journal of Nutritional Health and Food Engineering-02-00046
Patricia Funk
 
HCSC Presentation JAN 2015
HCSC Presentation JAN 2015HCSC Presentation JAN 2015
HCSC Presentation JAN 2015
Daniel Dickinson
 

Viewers also liked (20)

Nemo museum
Nemo museumNemo museum
Nemo museum
 
Gregory Crewdson
Gregory CrewdsonGregory Crewdson
Gregory Crewdson
 
Introduction to asp.net
Introduction to asp.netIntroduction to asp.net
Introduction to asp.net
 
CV Jannie
CV JannieCV Jannie
CV Jannie
 
Nash Community College BDF Program Presentation - Local Economic Outlook Lunc...
Nash Community College BDF Program Presentation - Local Economic Outlook Lunc...Nash Community College BDF Program Presentation - Local Economic Outlook Lunc...
Nash Community College BDF Program Presentation - Local Economic Outlook Lunc...
 
Business Envirment of England
Business Envirment of EnglandBusiness Envirment of England
Business Envirment of England
 
Driving directions
Driving directionsDriving directions
Driving directions
 
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
 
New Capabilities in the PyData Ecosystem
New Capabilities in the PyData EcosystemNew Capabilities in the PyData Ecosystem
New Capabilities in the PyData Ecosystem
 
Coca Cola Beverage, Khurda
Coca Cola Beverage, KhurdaCoca Cola Beverage, Khurda
Coca Cola Beverage, Khurda
 
Рік безкарності: громадський аналіз розслідування справ Євромайдану
Рік безкарності: громадський аналіз розслідування справ ЄвромайдануРік безкарності: громадський аналіз розслідування справ Євромайдану
Рік безкарності: громадський аналіз розслідування справ Євромайдану
 
Journal of Nutritional Health and Food Engineering-02-00046
Journal of Nutritional Health and Food Engineering-02-00046Journal of Nutritional Health and Food Engineering-02-00046
Journal of Nutritional Health and Food Engineering-02-00046
 
Vehicle safety features
Vehicle safety featuresVehicle safety features
Vehicle safety features
 
Driving in the dark
Driving in the darkDriving in the dark
Driving in the dark
 
Excerise
ExceriseExcerise
Excerise
 
Nemo museum
Nemo museumNemo museum
Nemo museum
 
HCSC Presentation JAN 2015
HCSC Presentation JAN 2015HCSC Presentation JAN 2015
HCSC Presentation JAN 2015
 
Manufacturing Analytics at Scale
Manufacturing Analytics at ScaleManufacturing Analytics at Scale
Manufacturing Analytics at Scale
 
ASAM 2014 Year in Review
ASAM 2014 Year in ReviewASAM 2014 Year in Review
ASAM 2014 Year in Review
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
 

Similar to Collective Spammer Detection in Evolving Multi-Relational Social Networks

TruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social NetworkTruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social Network
Lora Aroyo
 
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Sc Huang
 

Similar to Collective Spammer Detection in Evolving Multi-Relational Social Networks (20)

Social Network Analysis with Spark
Social Network Analysis with SparkSocial Network Analysis with Spark
Social Network Analysis with Spark
 
Improve ML Predictions using Graph Analytics (today!)
Improve ML Predictions using Graph Analytics (today!)Improve ML Predictions using Graph Analytics (today!)
Improve ML Predictions using Graph Analytics (today!)
 
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
 
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
 
TruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social NetworkTruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social Network
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph Algorithms
 
Improve ml predictions using graph algorithms (webinar july 23_19).pptx
Improve ml predictions using graph algorithms (webinar july 23_19).pptxImprove ml predictions using graph algorithms (webinar july 23_19).pptx
Improve ml predictions using graph algorithms (webinar july 23_19).pptx
 
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
 
The circles of relations modifiers modeller from smartphone photo gallery
The circles of relations modifiers modeller from smartphone photo galleryThe circles of relations modifiers modeller from smartphone photo gallery
The circles of relations modifiers modeller from smartphone photo gallery
 
Poster
PosterPoster
Poster
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
Survey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POISurvey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POI
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)
 
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Soci...
 
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Service rating prediction by exploring social mobile users’ geographical loca...
Service rating prediction by exploring social mobile users’ geographical loca...Service rating prediction by exploring social mobile users’ geographical loca...
Service rating prediction by exploring social mobile users’ geographical loca...
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Activity Monitoring Using Wearable Sensors and Smart Phone
Activity Monitoring Using Wearable Sensors and Smart PhoneActivity Monitoring Using Wearable Sensors and Smart Phone
Activity Monitoring Using Wearable Sensors and Smart Phone
 

More from Turi, Inc.

Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Turi, Inc.
 
Machine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesMachine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive Services
Turi, Inc.
 

More from Turi, Inc. (20)

Webinar - Analyzing Video
Webinar - Analyzing VideoWebinar - Analyzing Video
Webinar - Analyzing Video
 
Webinar - Patient Readmission Risk
Webinar - Patient Readmission RiskWebinar - Patient Readmission Risk
Webinar - Patient Readmission Risk
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)
 
Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)
 
Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)
 
Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
 
Pattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log DataPattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log Data
 
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsIntelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning Toolkits
 
Text Analysis with Machine Learning
Text Analysis with Machine LearningText Analysis with Machine Learning
Text Analysis with Machine Learning
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
 
Machine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesMachine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive Services
 
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos GuestrinMachine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos Guestrin
 
Scalable data structures for data science
Scalable data structures for data scienceScalable data structures for data science
Scalable data structures for data science
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
 
Introduction to Recommender Systems
Introduction to Recommender SystemsIntroduction to Recommender Systems
Introduction to Recommender Systems
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
 
SFrame
SFrameSFrame
SFrame
 
Building Personalized Data Products with Dato
Building Personalized Data Products with DatoBuilding Personalized Data Products with Dato
Building Personalized Data Products with Dato
 

Recently uploaded

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

Collective Spammer Detection in Evolving Multi-Relational Social Networks

  • 1. Collective Spammer Detection in Evolving Multi-Relational Social Networks (To appear at KDD’15) Shobeir Fakhraei* University of Maryland *Research performed while on internship at if(we) (Tagged Inc.) through Graphlab program In collaboration with: James Foulds (UCSC, Currently UCSD) Madhusudana Shashanka (if(we) Inc., Currently Niara Inc.) Lise Getoor (UCSC)
  • 2. Spam in Social Networks • Recent study by Nexgate in 2013: • Spam grew by more than 300% in half a year • 1 in 200 of social messages are spam • 5% of all social apps are spammy • Social networks: • Spammers have more ways to interact with users e.g, Comments on photos, send message, and send not verbal signals • They can split content in several messages and forms • More available info about users on their profiles! 2
  • 3. Our Approach • Lets look at the graph: • Multi-relational social network Links = actions at time t (e.g. profile view, message, or poke) • Time-stamped Links 3 t(1) t(3) t(2) t(4) t(5) t(6) t(7)t(8) t(9) t(10) t(11)
  • 4. Tagged.com • Tagged.com is a social network founded in 2004 which focuses on connecting people through different social features and games. • It has over 300 million registered members • Data sample for experiments (on a laptop): • 5.6 Million users (%3.9 Labeled Spammers) • 912 Million Links 4
  • 5. Graph Structure • Using Graphlab Create • Graph Analytics Toolkit to Generate Features: • In each relation graph we compute: PageRank, Total Degree, In Degree, Out Degree, k-Core, Graph Coloring, Connected Components, Triangle Count • Gradient Boosted Trees: • Classification (Spam or Legitimate) 5 Are you interested? Meet Me Play Pets Message Wink Report AbuseFriend Request Pagerank, K-core, Graph coloring, Triangle count, Connected components, In/out degree
  • 6. Graph Structure 6 Experiments AU-PR AU-ROC 1 Relation, k Feature type 0.187 ± 0.004 0.803 ±0.001 n Relations, 1 Feature Type 0.285 ± 0.002 0.809 ± 0.001 n Relations, k Features types 0.328 ± 0.003 0.817 ± 0.001
  • 7. Sequence of Actions • Sequential k-gram Features: Short sequence segment of k consecutive actions, to capture the order of events 7 User1 Actions: Message, Profile_view, Message, Friend_Request, ….
  • 8. Sequence of Actions • Mixture of Markov Models (MMM): Also called chain-augmented or tree-augmented naive Bayes model to capture longer sequences 8
  • 9. Sequence of Actions 9 Experiments AU-PR AU-ROC Bigram Features 0.471 ± 0.004 0.859 ± 0.001 MMM 0.246 ± 0.009 0.821 ± 0.003
  • 10. Results 10 Precision-Recall ROC We can classify 70% of the spammer that need manual labeling with about 90% accuracy
  • 11. Deployment and Example Runtimes • We can: • Run the model on short intervals, with new snapshots of the network • Update some of the features with events. • Example runtimes with Graphlab CreateTM on a Macbook Pro: • 5.6 million nodes and 350 million links • PageRank: 6.25 minutes • Triangle counting: 17.98 minutes • k-core: 14.3 minutes 11
  • 12. Refining the abuse reporting systems • Abuse report systems are very noisy. • People have different standards. • Spammers report random people to increase noise. • Personal gain in social games. • Goal is to clean up the system using • Reporters previous history • Looking at all the reports collectively 12
  • 13. Probabilistic Soft Logic (PSL) • Probabilistic Soft Logic (PSL) Declarative language based on first-order logic to express collective probabilistic inference problems. • Example: 13
  • 14. Results of Classification Using Reports 14 Experiments AU-PR AU-ROC Reports Only 0.674 ± 0.008 0.611 ± 0.007 Reports & Credibility 0.869 ± 0.006 0.862 ± 0.004 Reports & Credibility & Collective Reasoning 0.884 ± 0.005 0.873 ± 0.004
  • 15. Conclusions • Strong signals in multi-relational time-stamped graph • Graph Structure: • Multi-rationality is more important than extracting more features from one relations • Sequence: • Bigrams features of actions capture most of the patterns • Reports: • Incorporating credibility of reporting users and collective reasoning significantly improves the signal • Fast experiments iterations with Graphlab CreateTM facilitates finding these pattern 15
  • 16. Acknowledgements • Co-authors of the paper: James Foulds (UCSC) Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-) Lise Getoor (UCSC) • If(we) Inc. (Formerly Tagged Inc.): Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon Hill • Dato (Formerly Graphlab): Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng • The paper is available online, and code and part of the data will be released soon: http://www.cs.umd.edu/~shobeir 16
  • 17. Acknowledgements • Co-authors of the paper: James Foulds (UCSC) Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-) Lise Getoor (UCSC) • If(we) Inc. (Formerly Tagged Inc.): Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon Hill • Dato (Formerly Graphlab): Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng • The paper is available online, and code and part of the data will be released soon: http://www.cs.umd.edu/~shobeir 17

Editor's Notes

  1. Hi, Shobeir, UMD Spam at Social Net. Work done @ ifwe through graphlab internship Paper will apear at KDD this year Work with collaborators.