SlideShare a Scribd company logo
1 of 11
Spam Filter
      -Apeksha Agarwal
      -Kashika Srivatava
What is spam?
• Spam is the use of electronic messaging systems to send
  unsolicited bulk messages, especially




                                                            11/6/2012
  advertising, indiscriminately.




                                                              2
Types of Spam
• Email Spam ( Most Well Known, and topic for today )
• Comment Spam ( Probably that’s why we have capcha )




                                                               11/6/2012
• Instant Messaging Spam ( E.g. In yahoo messengers, unknown
  messengers sending weird urls )
• Junk Fax ( Your machine is printing hundreds of spam
  messages and you cant delete them, thankfully now a horror
  of past )
• Unsolicited text messages. ( Offers make me think, I am
  luckiest girl alive )
• Social Networking Spams ( They are send by your friend who
  clicks on similar message send by their friend )
                                                                 3
Geographical Origins of spams
 Origin or source of spam
 refers to the geographical
 location of the computer




                                 11/6/2012
 from which the spam is
 sent; it is not the country
 where the spammer
 resides, nor the country that
 hosts the spamvertised site.

 Interesting Fact:
 As much as 80% of spam
 received by Internet users in
 North America and Europe
 can be traced to fewer than
 200 spammers
                                   4
Spam Topics in Q3 2012




                         11/6/2012
                           5
Other Fast Facts
• Spam accounts for 14.5 billion messages globally per day. In
  other words, spam makes up 45% of all emails.




                                                                  11/6/2012
• A 2004 survey estimated that lost productivity costs Internet
  users in the United States $21.58 billion annually.

• People switched to gmail from yahoo because of better spam
  filter

• Spam mails fill your email space and cause users to ask for
  more free space. Another technique used by gmail to lure
  users.                                                            6
Current Works :Bayesian Model
 • Based on Document Filtering concept




                                                                                          11/6/2012
Pr(S|W) is the probability that a message is a spam, knowing that the word "replica"
is in it;
Pr(S)     is the overall probability that any given message is spam;
Pr(W|S) is the probability that the word "replica" appears in spam messages;
Pr(H) is the overall probability that any given message is not spam (is "ham");
Pr(W|H) is the probability that the word "replica" appears in ham messages.

Combining Words:

 p :is the probability that the suspect message is spam;
 p1: is the probability that it is a spam knowing it contains a first word (for example
"replica");

Problem:
Bayesian Poisioning
                                                                                            7
Other Models( machine Learning Based)
•   Neural Networks
•   Graphical Models




                                                                11/6/2012
•   Logistic Regression
•   Support Vector Machines (SVMs)
•   all make fewer assumptions
•    These kinds of relationships between words implicitly or
    explicitly, at the expense of more complexity




                                                                  8
MSR: Challenge Response system
• Idea of Cynthia Dwork (now at Microsoft Research, Silicon
  Valley) and Moni Naor (at the Weizmann Institute of Science




                                                                11/6/2012
  in Israel.)
• First determine if a message is ham or spam and take action
• Aim try to search even false positive spams.
• Idea increase recall of ham messages
• So you send challenge of
small puzzle to sender,
who will answer it if it is
genuine
• Spammers do not have time                                       9
My idea: Collaborative intelligence

• Distinguish message as spam of ham from previous techniques
• Try to warn user of probable spam from mails classified as




                                                                   11/6/2012
  ham, from response of other readers
• A mail if send to 50 people. If it is classified as ham.
• Check the rate if others recipients try to mark it as spam.
• If a new user opens it, you say it is in inbox, but probably a
  spam, with some confidence.
• User is pre warned of possible spam in his inbox.



                                                                   10
References

• Commtouch: Internet Threats Trend Report October 2012




                                                                                                             11/6/2012
    (http://www.commtouch.com/download/2389)

• Semantic: Internet security report
    (http://www.symantec.com/content/en/us/enterprise/other_resources/b-istr_main_report_2011_21239364.en-
    us.pdf)

• Cisco: Security Report
    (http://www.cisco.com/en/US/prod/collateral/vpndevc/security_annual_report_2011.pdf)

• Wikipedia : http://en.wikipedia.org/wiki/Email_spam
• http://www.destinationcrm.com/Articles/Editorial/Magazine-Features/Avoid-the-Spam-Folder-
    84272.aspx

•            techsupportalert.com/content/how-why-switch-yahoo-mail-gmail.htm
    http://www.

• http://www.spamhaus.org/statistics/countries/
                                                                                                             11
• MSR:http://research.microsoft.com/en-us/um/people/joshuago/significance-
    spam_edited2-times.pdf

More Related Content

What's hot

Spam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptxSpam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptxKunal Kalamkar
 
Detecting Fake News Through NLP
Detecting Fake News Through NLPDetecting Fake News Through NLP
Detecting Fake News Through NLPSakha Global
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessingSalah Amean
 
E mail image spam filtering techniques
E mail image spam filtering techniquesE mail image spam filtering techniques
E mail image spam filtering techniquesranjit banshpal
 
Cybersecurity Awareness Training for Employees.pptx
Cybersecurity Awareness Training for Employees.pptxCybersecurity Awareness Training for Employees.pptx
Cybersecurity Awareness Training for Employees.pptxMustafa Amiri
 
E Mail & Spam Presentation
E Mail & Spam PresentationE Mail & Spam Presentation
E Mail & Spam Presentationnewsan2001
 
Spam and Anti-spam - Sudipta Bhattacharya
Spam and Anti-spam - Sudipta BhattacharyaSpam and Anti-spam - Sudipta Bhattacharya
Spam and Anti-spam - Sudipta Bhattacharyasankhadeep
 
INTRUSION DETECTION TECHNIQUES
INTRUSION DETECTION TECHNIQUESINTRUSION DETECTION TECHNIQUES
INTRUSION DETECTION TECHNIQUESTrinity Dwarka
 
Computer & internet Security
Computer & internet SecurityComputer & internet Security
Computer & internet SecurityGerard Lamusse
 
Cyber security awareness for students
Cyber security awareness for studentsCyber security awareness for students
Cyber security awareness for studentsKandarp Shah
 
Computer Vandalism
Computer VandalismComputer Vandalism
Computer VandalismAditya Singh
 

What's hot (20)

Spam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptxSpam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptx
 
Spam Email identification
Spam Email identificationSpam Email identification
Spam Email identification
 
Sms spam classification
Sms spam classificationSms spam classification
Sms spam classification
 
Detecting Fake News Through NLP
Detecting Fake News Through NLPDetecting Fake News Through NLP
Detecting Fake News Through NLP
 
Spam Detection Using Natural Language processing
Spam Detection Using Natural Language processingSpam Detection Using Natural Language processing
Spam Detection Using Natural Language processing
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Spam
SpamSpam
Spam
 
E mail image spam filtering techniques
E mail image spam filtering techniquesE mail image spam filtering techniques
E mail image spam filtering techniques
 
Cybersecurity Awareness Training for Employees.pptx
Cybersecurity Awareness Training for Employees.pptxCybersecurity Awareness Training for Employees.pptx
Cybersecurity Awareness Training for Employees.pptx
 
E Mail & Spam Presentation
E Mail & Spam PresentationE Mail & Spam Presentation
E Mail & Spam Presentation
 
Spam and Anti-spam - Sudipta Bhattacharya
Spam and Anti-spam - Sudipta BhattacharyaSpam and Anti-spam - Sudipta Bhattacharya
Spam and Anti-spam - Sudipta Bhattacharya
 
Web security
Web securityWeb security
Web security
 
INTRUSION DETECTION TECHNIQUES
INTRUSION DETECTION TECHNIQUESINTRUSION DETECTION TECHNIQUES
INTRUSION DETECTION TECHNIQUES
 
Computer & internet Security
Computer & internet SecurityComputer & internet Security
Computer & internet Security
 
Cyber security awareness for students
Cyber security awareness for studentsCyber security awareness for students
Cyber security awareness for students
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
Email spam detection
Email spam detectionEmail spam detection
Email spam detection
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
Dos attack
Dos attackDos attack
Dos attack
 
Computer Vandalism
Computer VandalismComputer Vandalism
Computer Vandalism
 

Similar to Spam

Modern cyber threats_and_how_to_combat_them_panel
Modern cyber threats_and_how_to_combat_them_panelModern cyber threats_and_how_to_combat_them_panel
Modern cyber threats_and_how_to_combat_them_panelRamsés Gallego
 
Introduction to apache spark and machine learning
Introduction to apache spark and machine learningIntroduction to apache spark and machine learning
Introduction to apache spark and machine learningAwoyemi Ezekiel
 
A review of spam filtering and measures of antispam
A review of spam filtering and measures of antispamA review of spam filtering and measures of antispam
A review of spam filtering and measures of antispamAlexander Decker
 
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...eCommConf
 
Commtouch outbound-anti spam-webinar-201312-final
Commtouch outbound-anti spam-webinar-201312-finalCommtouch outbound-anti spam-webinar-201312-final
Commtouch outbound-anti spam-webinar-201312-finalCyren, Inc
 
Tcf 335 chapter 11 12 email and mobile 2014
Tcf 335 chapter 11 12 email and mobile 2014Tcf 335 chapter 11 12 email and mobile 2014
Tcf 335 chapter 11 12 email and mobile 2014carleigh2000
 
Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010Yahoo Developer Network
 
Evaluating and Implementing Anti-Spam Solutions
Evaluating and Implementing Anti-Spam SolutionsEvaluating and Implementing Anti-Spam Solutions
Evaluating and Implementing Anti-Spam SolutionsMichael Lamont
 
Internet etiquette
Internet etiquetteInternet etiquette
Internet etiquetteAdy Setiawan
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 

Similar to Spam (20)

Modern cyber threats_and_how_to_combat_them_panel
Modern cyber threats_and_how_to_combat_them_panelModern cyber threats_and_how_to_combat_them_panel
Modern cyber threats_and_how_to_combat_them_panel
 
Fighting Spam at Flickr
Fighting Spam at FlickrFighting Spam at Flickr
Fighting Spam at Flickr
 
Introduction to apache spark and machine learning
Introduction to apache spark and machine learningIntroduction to apache spark and machine learning
Introduction to apache spark and machine learning
 
A review of spam filtering and measures of antispam
A review of spam filtering and measures of antispamA review of spam filtering and measures of antispam
A review of spam filtering and measures of antispam
 
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
David Troy - Presentation at Emerging Communications Conference & Awards (eCo...
 
Commtouch outbound-anti spam-webinar-201312-final
Commtouch outbound-anti spam-webinar-201312-finalCommtouch outbound-anti spam-webinar-201312-final
Commtouch outbound-anti spam-webinar-201312-final
 
B0940509
B0940509B0940509
B0940509
 
Tcf 335 chapter 11 12 email and mobile 2014
Tcf 335 chapter 11 12 email and mobile 2014Tcf 335 chapter 11 12 email and mobile 2014
Tcf 335 chapter 11 12 email and mobile 2014
 
Malware
MalwareMalware
Malware
 
Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010
 
Evaluating and Implementing Anti-Spam Solutions
Evaluating and Implementing Anti-Spam SolutionsEvaluating and Implementing Anti-Spam Solutions
Evaluating and Implementing Anti-Spam Solutions
 
Spam
SpamSpam
Spam
 
Fighting spam
Fighting spamFighting spam
Fighting spam
 
Aisb cyberbullying
Aisb cyberbullyingAisb cyberbullying
Aisb cyberbullying
 
402 406
402 406402 406
402 406
 
Internet etiquette
Internet etiquetteInternet etiquette
Internet etiquette
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 

Spam

  • 1. Spam Filter -Apeksha Agarwal -Kashika Srivatava
  • 2. What is spam? • Spam is the use of electronic messaging systems to send unsolicited bulk messages, especially 11/6/2012 advertising, indiscriminately. 2
  • 3. Types of Spam • Email Spam ( Most Well Known, and topic for today ) • Comment Spam ( Probably that’s why we have capcha ) 11/6/2012 • Instant Messaging Spam ( E.g. In yahoo messengers, unknown messengers sending weird urls ) • Junk Fax ( Your machine is printing hundreds of spam messages and you cant delete them, thankfully now a horror of past ) • Unsolicited text messages. ( Offers make me think, I am luckiest girl alive ) • Social Networking Spams ( They are send by your friend who clicks on similar message send by their friend ) 3
  • 4. Geographical Origins of spams Origin or source of spam refers to the geographical location of the computer 11/6/2012 from which the spam is sent; it is not the country where the spammer resides, nor the country that hosts the spamvertised site. Interesting Fact: As much as 80% of spam received by Internet users in North America and Europe can be traced to fewer than 200 spammers 4
  • 5. Spam Topics in Q3 2012 11/6/2012 5
  • 6. Other Fast Facts • Spam accounts for 14.5 billion messages globally per day. In other words, spam makes up 45% of all emails. 11/6/2012 • A 2004 survey estimated that lost productivity costs Internet users in the United States $21.58 billion annually. • People switched to gmail from yahoo because of better spam filter • Spam mails fill your email space and cause users to ask for more free space. Another technique used by gmail to lure users. 6
  • 7. Current Works :Bayesian Model • Based on Document Filtering concept 11/6/2012 Pr(S|W) is the probability that a message is a spam, knowing that the word "replica" is in it; Pr(S) is the overall probability that any given message is spam; Pr(W|S) is the probability that the word "replica" appears in spam messages; Pr(H) is the overall probability that any given message is not spam (is "ham"); Pr(W|H) is the probability that the word "replica" appears in ham messages. Combining Words: p :is the probability that the suspect message is spam; p1: is the probability that it is a spam knowing it contains a first word (for example "replica"); Problem: Bayesian Poisioning 7
  • 8. Other Models( machine Learning Based) • Neural Networks • Graphical Models 11/6/2012 • Logistic Regression • Support Vector Machines (SVMs) • all make fewer assumptions • These kinds of relationships between words implicitly or explicitly, at the expense of more complexity 8
  • 9. MSR: Challenge Response system • Idea of Cynthia Dwork (now at Microsoft Research, Silicon Valley) and Moni Naor (at the Weizmann Institute of Science 11/6/2012 in Israel.) • First determine if a message is ham or spam and take action • Aim try to search even false positive spams. • Idea increase recall of ham messages • So you send challenge of small puzzle to sender, who will answer it if it is genuine • Spammers do not have time 9
  • 10. My idea: Collaborative intelligence • Distinguish message as spam of ham from previous techniques • Try to warn user of probable spam from mails classified as 11/6/2012 ham, from response of other readers • A mail if send to 50 people. If it is classified as ham. • Check the rate if others recipients try to mark it as spam. • If a new user opens it, you say it is in inbox, but probably a spam, with some confidence. • User is pre warned of possible spam in his inbox. 10
  • 11. References • Commtouch: Internet Threats Trend Report October 2012 11/6/2012 (http://www.commtouch.com/download/2389) • Semantic: Internet security report (http://www.symantec.com/content/en/us/enterprise/other_resources/b-istr_main_report_2011_21239364.en- us.pdf) • Cisco: Security Report (http://www.cisco.com/en/US/prod/collateral/vpndevc/security_annual_report_2011.pdf) • Wikipedia : http://en.wikipedia.org/wiki/Email_spam • http://www.destinationcrm.com/Articles/Editorial/Magazine-Features/Avoid-the-Spam-Folder- 84272.aspx • techsupportalert.com/content/how-why-switch-yahoo-mail-gmail.htm http://www. • http://www.spamhaus.org/statistics/countries/ 11 • MSR:http://research.microsoft.com/en-us/um/people/joshuago/significance- spam_edited2-times.pdf