Collective Spammer Detection in Evolving Multi-Relational Social Networks
1. Collective Spammer Detection in Evolving Multi-Relational
Social Networks
(To appear at KDD’15)
Shobeir Fakhraei*
University of Maryland
*Research performed while on internship at if(we) (Tagged Inc.) through Graphlab program
In collaboration with:
James Foulds (UCSC, Currently UCSD)
Madhusudana Shashanka (if(we) Inc., Currently Niara Inc.)
Lise Getoor (UCSC)
2. Spam in Social Networks
• Recent study by Nexgate in 2013:
• Spam grew by more than 300% in half a year
• 1 in 200 of social messages are spam
• 5% of all social apps are spammy
• Social networks:
• Spammers have more ways to interact with users
e.g, Comments on photos, send message, and send not verbal signals
• They can split content in several messages and forms
• More available info about users on their profiles!
2
3. Our Approach
• Lets look at the graph:
• Multi-relational social
network
Links = actions at time t
(e.g. profile view, message,
or poke)
• Time-stamped Links
3
t(1)
t(3)
t(2)
t(4)
t(5)
t(6)
t(7)t(8)
t(9)
t(10)
t(11)
4. Tagged.com
• Tagged.com is a social network founded in 2004
which focuses on connecting people through different
social features and games.
• It has over 300 million registered members
• Data sample for experiments (on a laptop):
• 5.6 Million users (%3.9 Labeled Spammers)
• 912 Million Links
4
5. Graph Structure
• Using Graphlab Create
• Graph Analytics Toolkit to Generate Features:
• In each relation graph we compute:
PageRank, Total Degree, In Degree, Out Degree, k-Core, Graph Coloring,
Connected Components, Triangle Count
• Gradient Boosted Trees:
• Classification (Spam or Legitimate)
5
Are you interested?
Meet Me Play Pets Message Wink Report AbuseFriend Request
Pagerank,
K-core,
Graph coloring,
Triangle count,
Connected components,
In/out degree
6. Graph Structure
6
Experiments AU-PR AU-ROC
1 Relation,
k Feature type
0.187 ± 0.004 0.803 ±0.001
n Relations,
1 Feature Type
0.285 ± 0.002 0.809 ± 0.001
n Relations,
k Features types
0.328 ± 0.003 0.817 ± 0.001
7. Sequence of Actions
• Sequential k-gram Features:
Short sequence segment of k consecutive actions,
to capture the order of events
7
User1 Actions:
Message, Profile_view, Message, Friend_Request, ….
8. Sequence of Actions
• Mixture of Markov Models (MMM):
Also called chain-augmented or tree-augmented
naive Bayes model to capture longer sequences
8
11. Deployment and Example Runtimes
• We can:
• Run the model on short intervals, with new snapshots of the
network
• Update some of the features with events.
• Example runtimes with Graphlab CreateTM on a Macbook Pro:
• 5.6 million nodes and 350 million links
• PageRank: 6.25 minutes
• Triangle counting: 17.98 minutes
• k-core: 14.3 minutes
11
12. Refining the abuse reporting systems
• Abuse report systems are very noisy.
• People have different standards.
• Spammers report random people to increase noise.
• Personal gain in social games.
• Goal is to clean up the system using
• Reporters previous history
• Looking at all the reports collectively
12
13. Probabilistic Soft Logic (PSL)
• Probabilistic Soft Logic (PSL) Declarative language
based on first-order logic to express collective
probabilistic inference problems.
• Example:
13
15. Conclusions
• Strong signals in multi-relational time-stamped graph
• Graph Structure:
• Multi-rationality is more important than extracting more features from one relations
• Sequence:
• Bigrams features of actions capture most of the patterns
• Reports:
• Incorporating credibility of reporting users and collective reasoning significantly improves the
signal
• Fast experiments iterations with Graphlab CreateTM facilitates finding these
pattern
15
16. Acknowledgements
• Co-authors of the paper:
James Foulds (UCSC)
Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)
Lise Getoor (UCSC)
• If(we) Inc. (Formerly Tagged Inc.):
Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon Hill
• Dato (Formerly Graphlab):
Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng
• The paper is available online, and code and part of the
data will be released soon:
http://www.cs.umd.edu/~shobeir
16
17. Acknowledgements
• Co-authors of the paper:
James Foulds (UCSC)
Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)
Lise Getoor (UCSC)
• If(we) Inc. (Formerly Tagged Inc.):
Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon Hill
• Dato (Formerly Graphlab):
Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng
• The paper is available online, and code and part of the
data will be released soon:
http://www.cs.umd.edu/~shobeir
17
Editor's Notes
Hi,
Shobeir, UMD
Spam at Social Net.
Work done @ ifwe through graphlab internship
Paper will apear at KDD this year
Work with collaborators.