SlideShare una empresa de Scribd logo
1 de 79
Finding True Love on
        the Internet
With Matthew Rothenberg and Stewart Butterfield
Just kidding. This is much less exciting.
Fighting Spam at Flickr




         Your Spam. Our Balls.
 Mikhail Panchenko and Simon Batistoni
Spammers

• Numerous
• Diverse
• Inventive
• Ubiquitous - if there’s a textbox with an
  implied recipient, they will spam it.
A simpler time
• Sending spam is an incredibly complicated
  scheme these days
• Highly distributed bot nets of unsuspecting,
  heterogenous machines
• The result of a long long arms race
• That means that combatting it is
  complicated as well
Skynet is here

• Bots/scripts are able to signup for accounts
  (including filling out captcha), log into
  Flickr, upload photos, set their buddy icon,
  and start sending spam.
• You can also buy these accounts in bulk...
The Harsh Truth


 Someone whose time is really cheap is
constantly working to send spam through
               your site
http://icanhascheezburger.com/2008/01/22/funny-pictures-sisyphus-cat-tries-again/ - see more Lolcats and funny pictures
"The struggle itself...is enough to fill a man's
 heart. One must imagine Sisyphus happy."
                           Albert Camus, The Myth of Sisyphus
... but there’s hope; we’ll get to that later
Social Sites as Gateways
Social Sites as Gateways
• User-generated content
Social Sites as Gateways
• User-generated content
 • “User” is a broad category that includes
    “spammer asshole”
Social Sites as Gateways
• User-generated content
 • “User” is a broad category that includes
    “spammer asshole”
• Email notifications for said content
Social Sites as Gateways
• User-generated content
 • “User” is a broad category that includes
    “spammer asshole”
• Email notifications for said content
• Relationship based, trust inducing
Social Sites as Gateways
• User-generated content
 • “User” is a broad category that includes
    “spammer asshole”
• Email notifications for said content
• Relationship based, trust inducing
 • Mom gets excited any time she gets an
    email from Flickr
What Trust Means
• Something familiar that a user is used to
  opening
• Increases the likelihood that a user will
  open the email and perform whatever it is
  that you want them to
  • Piggybacking on the research and work
    done by the site itself!
More on Trust

• Very easy to lose - other services will
  blackhole mail coming from your domains
• Users stop coming
• Very hard to regain - the burden of proof
  ends up entirely on you
The Answer is Simple
The Answer is Simple


Don’t let users generate content!
The Economics of Spam
  ( an excuse to pretend to use my degree )
The Demand: sites want exposure,
sometimes at any cost
The Supply: trusted message gateways
The Demand: sites want exposure,
sometimes at any cost
The Supply: trusted message gateways


      the broken part - someone
      else is selling your gateway
Econ 101
Econ Continued

  The more well-known your site gets, the
higher the demand for your message delivery
   mechanism - more likely a recipient will
          actually open the message
ANOTHER GRAPH!
In a Perfect World
Some Numbers
"Spamalytics: An Empirical Analysis of Spam
Marketing Conversion"
C. Kanich, C. Kreibich, K. Levchenko, B.
Enright, G. Voelker,V. Paxson, and S. Savage.
15th ACM Conference on Computer and
Communications Security (CCS), 27-31
October 2008, Alexandria,VA.
http://www.icsi.berkeley.edu/pubs/
networking/2008-ccs-spamalytics.pdf
Some Numbers


• 0.0000081% overall conversion rate
• 28 conversions for every 347,590,389
  emails attempted
Where we fit
• Only ~25% of the attempted emails sent
  were actually accepted by the mail server
  ( first step in the funnel )
• Using a social site as a gateway almost
  guarantees a higher number
  • A whole lot of effort goes into making
    sure notifications get delivered
Put some $$ on it


• $3.5 million dollars of revenue in a year
• 5% increase in delivery rate = $175,000/yr
They figured this out
Back to Trust
• This can’t be ignored
• Remember, once you lose that trust, it’s a
  long way back up
• As you lose your trustworthiness as a
  message gateway, the spammers go away
• ... but so do the users
Fighting Back
Traditional Prevention
Traditional Prevention
• Captchas
Traditional Prevention
• Captchas
• Mass Signup detection using IPs
Traditional Prevention
• Captchas
• Mass Signup detection using IPs
• Rate Limiting
these are mostly good things, and it certainly
         doesn’t hurt to have them




                ... however ...
A Confession
A Confession
I almost always have to type a captcha code twice
A Confession
I almost always have to type a captcha code twice

 Bots consistently evolve to solve incrementally
              complex variations
A Confession
I almost always have to type a captcha code twice

 Bots consistently evolve to solve incrementally
              complex variations

             Draw your conclusions
Photo from http://www.flickr.com/photos/azkid2dc
The Tension
The Tension
• Want to be able to allow users to send
  messages and generally enjoy themselves
The Tension
• Want to be able to allow users to send
  messages and generally enjoy themselves
• Don’t want to make it too easy to send
  spam
The Tension
• Want to be able to allow users to send
  messages and generally enjoy themselves
• Don’t want to make it too easy to send
  spam
 • Traditional prevention techniques like
    captchas result in epic degradation of UX
    and ultimately end up ineffective
Traditional Response
•   User reports
•   Manual account removal
•   Manual message cleanup
    •   except you can’t clean up the email once it’s
        sent
•   Manually Adding patterns to a list of things to
    filter
•   Engineers running mass deletion/cleanup scripts
Photo from http://www.flickr.com/photos/mekin/
What a Waste

• Responding to incidents this way is a huge
  drain on resources and morale
• That’s time your team could be spending
  on projects, features, being happy...
The Alternative
A holistic, comprehensive approach
The Alternative
A holistic, comprehensive approach



 ( aka “take this shit seriously” )
Make Time
Make Time
• Product teams might be reticent to put
  spam on the roadmap and dedicate
  resources to it
Make Time
• Product teams might be reticent to put
  spam on the roadmap and dedicate
  resources to it
• ... until you miss a bunch of deadlines
  because you’re too busy cleaning up spam
Make Time
• Product teams might be reticent to put
  spam on the roadmap and dedicate
  resources to it
• ... until you miss a bunch of deadlines
  because you’re too busy cleaning up spam
• ... and your notifications aren’t being
  delivered because you’re blacklisted
Develop a Strategy

• A spam attack is no different than a typical
  DoS or outage - you need a plan
• Figure out what data you need and whether
  or not you already have it
• Figure out ways to consolidate and
  automate the work
Build your Tools

• Make things reusable
 • a user should look the same in all tools
 • tools that show lists of users should
    reuse the same logic for batch ops
• Leave a consistent trail
Look at the Big Picture

• Your tools should be very well integrated
 • your user report tools should pop
    suspected accounts into review tools
  • deleting accounts and messages should
    automatically close user report cases
The goal is to be able to have one person look at
a single tool, make decisions, and go back to sleep
Photo from http://www.flickr.com/photos/dreamcicle
... but we can get close!
Work Smart
Work Smart

• Spam is limited to going from one user on
  your site to another user on your site
• That forces certain behavior patterns -
  know what those are for your site
Work Smart, continued
•   If you have some obstacles at signup time
    (captcha, mass signup detection), you can pretty
    much expect two things:
•   a slow trickle of signups (to get around signup-
    time mass signup checks)
•   a sudden surge of messages
    •   Constant “under the radar” trickle doesn’t make
        sense - if you delete the accounts after a few
        user reports, they don’t get their payload sent
Work Smart, continued
You know a LOT about your users by default

• The signup - when, where
• Relationships are key
• You can see what’s happening globally
 • patterns are important
• The message contents are less helpful, and
  really, less important
Examine What You Send
• Separate the act of sending a message from
  the actual delivery
 • Obviously doesn’t work with all content
• Queue up messages at some reasonable
  interval instead of sending them instantly
• Examine what’s in the queue before sending
  it out
Clustering is your friend
• Cluster the messages in the queue using as
  many characteristics as possible
• Doing this will make most spam look really
  obvious
• Fairly straight forward to implement ( don’t
  need a massive cluster or Hadoop, at least
  initially )
Clustering Scores
• (I’m sure there’s a more scientific term for this)
• The size of the cluster a particular message
  belongs to as a percentage of the total number
  of messages
• Example: if you have 200 messages and a
  message falls into a cluster of 10, that message’s
  cluster score for that particular characteristic is
  5 (10/200 = .05 = 5%)
Example
             Signup Date Score   Signup IP Score
Message 1            5                  3
Message 2            4                  8
Message 3            6                 12
Message 4            7                  4
Message 5            6                 10
Message 6           20                 19
Message 7           20                 20
Message 8           20                 19
Message 9           20                 19
Message 10          20                 20
Example
             Signup Date Score   Signup IP Score
Message 1            5                  3
Message 2            4                  8
Message 3            6                 12
Message 4            7                  4
Message 5            6                 10
Message 6           20                 19
Message 7           20                 20
Message 8           20                 19
Message 9           20                 19
Message 10          20                 20
JACKPOT



 Photo from http://www.flickr.com/photos/aresauburnphotos/
The Tough Questions

• What do you do with this information?
• Just how much can you automate?
• We’re still looking for that balance
Further Reading

• http://www.icsi.berkeley.edu/pubs/
  networking/2008-ccs-spamalytics.pdf
• http://www.slideshare.net/
  hadoopusergroup/mail-antispam

Más contenido relacionado

Similar a Fighting Spam at Flickr

Blitzing with your defense bea con
Blitzing with your defense bea conBlitzing with your defense bea con
Blitzing with your defense bea con
Innismir
 
Lecture 2 blogging
Lecture 2   bloggingLecture 2   blogging
Lecture 2 blogging
rskslides
 
Empowerment Technology By: Zyrhell Rafer and Bretny Roces
Empowerment Technology By: Zyrhell Rafer and Bretny RocesEmpowerment Technology By: Zyrhell Rafer and Bretny Roces
Empowerment Technology By: Zyrhell Rafer and Bretny Roces
Padsromel
 

Similar a Fighting Spam at Flickr (20)

increase your impact with e-newsletters
increase your impact with e-newslettersincrease your impact with e-newsletters
increase your impact with e-newsletters
 
Advanced Error Handling Strategies for ColdFusion
Advanced Error Handling Strategies for ColdFusion Advanced Error Handling Strategies for ColdFusion
Advanced Error Handling Strategies for ColdFusion
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorial
 
NCET Biz Bite | Darren McBride, Email Productivity Tips and Tricks | Mar 28
NCET Biz Bite | Darren McBride, Email Productivity Tips and Tricks | Mar 28NCET Biz Bite | Darren McBride, Email Productivity Tips and Tricks | Mar 28
NCET Biz Bite | Darren McBride, Email Productivity Tips and Tricks | Mar 28
 
Lessons Learned From the Evolution of Spam
Lessons Learned From the Evolution of SpamLessons Learned From the Evolution of Spam
Lessons Learned From the Evolution of Spam
 
Blitzing with your defense bea con
Blitzing with your defense bea conBlitzing with your defense bea con
Blitzing with your defense bea con
 
Thoughts on Defensive Development for Sitecore
Thoughts on Defensive Development for SitecoreThoughts on Defensive Development for Sitecore
Thoughts on Defensive Development for Sitecore
 
How an Attacker "Audits" Your Software Systems
How an Attacker "Audits" Your Software SystemsHow an Attacker "Audits" Your Software Systems
How an Attacker "Audits" Your Software Systems
 
JUG CH September 2021 - Debugging distributed systems
JUG CH September 2021 - Debugging distributed systemsJUG CH September 2021 - Debugging distributed systems
JUG CH September 2021 - Debugging distributed systems
 
User Interfaces and Algorithms for Fighting Phishing, at Google Tech Talk Jan...
User Interfaces and Algorithms for Fighting Phishing, at Google Tech Talk Jan...User Interfaces and Algorithms for Fighting Phishing, at Google Tech Talk Jan...
User Interfaces and Algorithms for Fighting Phishing, at Google Tech Talk Jan...
 
Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010
 
Lecture 2 blogging
Lecture 2   bloggingLecture 2   blogging
Lecture 2 blogging
 
Ar design reality2018
Ar design reality2018Ar design reality2018
Ar design reality2018
 
Growth Hacking Workshop
Growth Hacking WorkshopGrowth Hacking Workshop
Growth Hacking Workshop
 
Startup Fuze: Lean Startup, Customer Development & Validation Process
Startup Fuze: Lean Startup, Customer Development & Validation ProcessStartup Fuze: Lean Startup, Customer Development & Validation Process
Startup Fuze: Lean Startup, Customer Development & Validation Process
 
Empowerment Technology By: Zyrhell Rafer and Bretny Roces
Empowerment Technology By: Zyrhell Rafer and Bretny RocesEmpowerment Technology By: Zyrhell Rafer and Bretny Roces
Empowerment Technology By: Zyrhell Rafer and Bretny Roces
 
Innovation prezo
Innovation prezoInnovation prezo
Innovation prezo
 
From OSINT to Phishing presentation
From OSINT to Phishing presentationFrom OSINT to Phishing presentation
From OSINT to Phishing presentation
 
Social engineering-Attack of the Human Behavior
Social engineering-Attack of the Human BehaviorSocial engineering-Attack of the Human Behavior
Social engineering-Attack of the Human Behavior
 
Annual Scary Episode on What's Scaring Us for 2016
Annual Scary Episode on What's Scaring Us for 2016Annual Scary Episode on What's Scaring Us for 2016
Annual Scary Episode on What's Scaring Us for 2016
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Fighting Spam at Flickr

  • 1. Finding True Love on the Internet With Matthew Rothenberg and Stewart Butterfield
  • 2. Just kidding. This is much less exciting.
  • 3. Fighting Spam at Flickr Your Spam. Our Balls. Mikhail Panchenko and Simon Batistoni
  • 4.
  • 5. Spammers • Numerous • Diverse • Inventive • Ubiquitous - if there’s a textbox with an implied recipient, they will spam it.
  • 6. A simpler time • Sending spam is an incredibly complicated scheme these days • Highly distributed bot nets of unsuspecting, heterogenous machines • The result of a long long arms race • That means that combatting it is complicated as well
  • 7. Skynet is here • Bots/scripts are able to signup for accounts (including filling out captcha), log into Flickr, upload photos, set their buddy icon, and start sending spam. • You can also buy these accounts in bulk...
  • 8.
  • 9. The Harsh Truth Someone whose time is really cheap is constantly working to send spam through your site
  • 11. "The struggle itself...is enough to fill a man's heart. One must imagine Sisyphus happy." Albert Camus, The Myth of Sisyphus
  • 12. ... but there’s hope; we’ll get to that later
  • 13. Social Sites as Gateways
  • 14. Social Sites as Gateways • User-generated content
  • 15. Social Sites as Gateways • User-generated content • “User” is a broad category that includes “spammer asshole”
  • 16. Social Sites as Gateways • User-generated content • “User” is a broad category that includes “spammer asshole” • Email notifications for said content
  • 17. Social Sites as Gateways • User-generated content • “User” is a broad category that includes “spammer asshole” • Email notifications for said content • Relationship based, trust inducing
  • 18. Social Sites as Gateways • User-generated content • “User” is a broad category that includes “spammer asshole” • Email notifications for said content • Relationship based, trust inducing • Mom gets excited any time she gets an email from Flickr
  • 19. What Trust Means • Something familiar that a user is used to opening • Increases the likelihood that a user will open the email and perform whatever it is that you want them to • Piggybacking on the research and work done by the site itself!
  • 20. More on Trust • Very easy to lose - other services will blackhole mail coming from your domains • Users stop coming • Very hard to regain - the burden of proof ends up entirely on you
  • 21. The Answer is Simple
  • 22. The Answer is Simple Don’t let users generate content!
  • 23. The Economics of Spam ( an excuse to pretend to use my degree )
  • 24. The Demand: sites want exposure, sometimes at any cost The Supply: trusted message gateways
  • 25. The Demand: sites want exposure, sometimes at any cost The Supply: trusted message gateways the broken part - someone else is selling your gateway
  • 27. Econ Continued The more well-known your site gets, the higher the demand for your message delivery mechanism - more likely a recipient will actually open the message
  • 29. In a Perfect World
  • 30. Some Numbers "Spamalytics: An Empirical Analysis of Spam Marketing Conversion" C. Kanich, C. Kreibich, K. Levchenko, B. Enright, G. Voelker,V. Paxson, and S. Savage. 15th ACM Conference on Computer and Communications Security (CCS), 27-31 October 2008, Alexandria,VA. http://www.icsi.berkeley.edu/pubs/ networking/2008-ccs-spamalytics.pdf
  • 31. Some Numbers • 0.0000081% overall conversion rate • 28 conversions for every 347,590,389 emails attempted
  • 32. Where we fit • Only ~25% of the attempted emails sent were actually accepted by the mail server ( first step in the funnel ) • Using a social site as a gateway almost guarantees a higher number • A whole lot of effort goes into making sure notifications get delivered
  • 33. Put some $$ on it • $3.5 million dollars of revenue in a year • 5% increase in delivery rate = $175,000/yr
  • 35. Back to Trust • This can’t be ignored • Remember, once you lose that trust, it’s a long way back up • As you lose your trustworthiness as a message gateway, the spammers go away • ... but so do the users
  • 39. Traditional Prevention • Captchas • Mass Signup detection using IPs
  • 40. Traditional Prevention • Captchas • Mass Signup detection using IPs • Rate Limiting
  • 41. these are mostly good things, and it certainly doesn’t hurt to have them ... however ...
  • 43. A Confession I almost always have to type a captcha code twice
  • 44. A Confession I almost always have to type a captcha code twice Bots consistently evolve to solve incrementally complex variations
  • 45. A Confession I almost always have to type a captcha code twice Bots consistently evolve to solve incrementally complex variations Draw your conclusions
  • 48. The Tension • Want to be able to allow users to send messages and generally enjoy themselves
  • 49. The Tension • Want to be able to allow users to send messages and generally enjoy themselves • Don’t want to make it too easy to send spam
  • 50. The Tension • Want to be able to allow users to send messages and generally enjoy themselves • Don’t want to make it too easy to send spam • Traditional prevention techniques like captchas result in epic degradation of UX and ultimately end up ineffective
  • 51. Traditional Response • User reports • Manual account removal • Manual message cleanup • except you can’t clean up the email once it’s sent • Manually Adding patterns to a list of things to filter • Engineers running mass deletion/cleanup scripts
  • 53.
  • 54. What a Waste • Responding to incidents this way is a huge drain on resources and morale • That’s time your team could be spending on projects, features, being happy...
  • 55. The Alternative A holistic, comprehensive approach
  • 56. The Alternative A holistic, comprehensive approach ( aka “take this shit seriously” )
  • 58. Make Time • Product teams might be reticent to put spam on the roadmap and dedicate resources to it
  • 59. Make Time • Product teams might be reticent to put spam on the roadmap and dedicate resources to it • ... until you miss a bunch of deadlines because you’re too busy cleaning up spam
  • 60. Make Time • Product teams might be reticent to put spam on the roadmap and dedicate resources to it • ... until you miss a bunch of deadlines because you’re too busy cleaning up spam • ... and your notifications aren’t being delivered because you’re blacklisted
  • 61.
  • 62. Develop a Strategy • A spam attack is no different than a typical DoS or outage - you need a plan • Figure out what data you need and whether or not you already have it • Figure out ways to consolidate and automate the work
  • 63. Build your Tools • Make things reusable • a user should look the same in all tools • tools that show lists of users should reuse the same logic for batch ops • Leave a consistent trail
  • 64. Look at the Big Picture • Your tools should be very well integrated • your user report tools should pop suspected accounts into review tools • deleting accounts and messages should automatically close user report cases
  • 65. The goal is to be able to have one person look at a single tool, make decisions, and go back to sleep
  • 67. ... but we can get close!
  • 69. Work Smart • Spam is limited to going from one user on your site to another user on your site • That forces certain behavior patterns - know what those are for your site
  • 70. Work Smart, continued • If you have some obstacles at signup time (captcha, mass signup detection), you can pretty much expect two things: • a slow trickle of signups (to get around signup- time mass signup checks) • a sudden surge of messages • Constant “under the radar” trickle doesn’t make sense - if you delete the accounts after a few user reports, they don’t get their payload sent
  • 71. Work Smart, continued You know a LOT about your users by default • The signup - when, where • Relationships are key • You can see what’s happening globally • patterns are important • The message contents are less helpful, and really, less important
  • 72. Examine What You Send • Separate the act of sending a message from the actual delivery • Obviously doesn’t work with all content • Queue up messages at some reasonable interval instead of sending them instantly • Examine what’s in the queue before sending it out
  • 73. Clustering is your friend • Cluster the messages in the queue using as many characteristics as possible • Doing this will make most spam look really obvious • Fairly straight forward to implement ( don’t need a massive cluster or Hadoop, at least initially )
  • 74. Clustering Scores • (I’m sure there’s a more scientific term for this) • The size of the cluster a particular message belongs to as a percentage of the total number of messages • Example: if you have 200 messages and a message falls into a cluster of 10, that message’s cluster score for that particular characteristic is 5 (10/200 = .05 = 5%)
  • 75. Example Signup Date Score Signup IP Score Message 1 5 3 Message 2 4 8 Message 3 6 12 Message 4 7 4 Message 5 6 10 Message 6 20 19 Message 7 20 20 Message 8 20 19 Message 9 20 19 Message 10 20 20
  • 76. Example Signup Date Score Signup IP Score Message 1 5 3 Message 2 4 8 Message 3 6 12 Message 4 7 4 Message 5 6 10 Message 6 20 19 Message 7 20 20 Message 8 20 19 Message 9 20 19 Message 10 20 20
  • 77. JACKPOT Photo from http://www.flickr.com/photos/aresauburnphotos/
  • 78. The Tough Questions • What do you do with this information? • Just how much can you automate? • We’re still looking for that balance
  • 79. Further Reading • http://www.icsi.berkeley.edu/pubs/ networking/2008-ccs-spamalytics.pdf • http://www.slideshare.net/ hadoopusergroup/mail-antispam

Notas del editor

  1. I am living proof that you can work at a photosite, own a really nice camera and still take really crappy photos. Simon couldn’t make because he got married this past weekend and is off on his honeymoon.
  2. Fighting spam can be very depressing
  3. Whenever you “optimize” an email, you’re optimizing it for the spammers as well Mom example - not great with computers, only uses Flickr when I send something along. Likely to assume that any mail from Flickr is from me.
  4. * other sites spam detection - it all looks like “flickr.com” to them!
  5. Good, you’re popular. but that also means more spam
  6. You know all that work you did to make sure your emails get delivered? The spammers thank you.
  7. JUST the storm botnet
  8. story about spamhaus
  9. An important point: once the spam leaves your site, it damages your site’s reputation on other sites trying to combat spam - namely email providers
  10. The amount of work involved in dealing with the aftermath of a largescale spam attack when operating this way is insane. Engineers, support staff, ops - everyone is just doing manual, tedious work. Deleting accounts, going through user reports etc.
  11. Thank Simon for fighting the good fight
  12. Thank Simon for fighting the good fight
  13. Thank Simon for fighting the good fight
  14. ( I started out as a tools guy) Your tools should be very clear and easy to use allow for easy batch operations
  15. how long can you delay sending a message? in most cases, quite a bit of time; things like comments have to show up immediately, but you can delay the email notification.