SlideShare una empresa de Scribd logo
1 de 21
CSMSS
Chh. Shahu College of Engineering,Aurangabad
Seminar On
“Email & SMS Spam Detection”
Guided By:
Dr. S.V. Khidse
Presenting By:
Kunal kalamkar(3271)
Department of Computer Science and Engineering
2021-22
1
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD
Contents
 Introduction
 Technologies
 Libraries
 Machine Learning
 Data set
 Problem definition
 Description of dataset
 Methodology
 Algorithms
 Conclusion
 References
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 2
Introduction
3
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD
In today’s globalized world, email is a primary source of communication. This
communication can vary from personal, business, corporate to government. With the rapid
increase in email usage, there has also been increase in the SPAM emails. SPAM emails, also
known as junk email involves nearly identical messages sent to numerous recipients by
email. Apart from being annoying, spam emails can also pose a security threat to computer
system. It is estimated that spam cost businesses on the order of $100 billion in 2007. In this
project, we use text mining to perform automatic spam filtering to use emails effectively. We
try to identify patterns using Data-mining classification algorithms to enable us classify the
emails as HAM or SPAM
Technologies
Technologies used :-
1. Python
2. HTML
3. CSS
4. JavaScript
Python :- Python is an interpreted, object-oriented, high-level programming language
with dynamic semantics developed by Guido van Rossum
Libraries: 1. Numpy
2. Pandas
3. Sklearn
4. NLTK
4
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD
Libraries
NumPy:- NumPy is a library for the Python programming language, adding support for
large, multi-dimensional arrays and matrices, along with a large collection of high-level
mathematical functions to operate on these arrays. Moreover, NumPy forms the foundation
of the Machine Learning stack.
 Pandas:- Pandas is one of the tools in Machine Learning which is used for data cleaning
and analysis. It has features which are used for exploring, cleaning, transforming and
visualizing from data
 NLTK:- NLTK is intended to support research and teaching in NLP or closely related areas,
including empirical linguistics, cognitive science, artificial intelligence, information retrieval,
and machine learning.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 5
Continued…
 Matplotlib:- Matplotlib is a low-level library of Python which is used for data visualization.
It is easy to use and emulates MATLAB like graphs and visualization. This library is built on the
top of NumPy arrays and consist of several plots like line chart, bar chart, histogram, etc.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 6
Machine Learning
Arthur Samuel, an early American leader in the field of computer gaming and artificial
intelligence, coined the term “Machine Learning ” in 1959 while at IBM. He defined machine
learning as “the field of study that gives computers the ability to learn without being explicitly
programmed “.
• Machine learning is programming computers to optimize a performance criterion using
example data or past experience .
• The field of study known as machine learning is concerned with the question of how to
construct computer programs that automatically improve with experience.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 7
Data set
A machine learning dataset is a collection of data that is used to train the model. A dataset
acts as an example to teach the machine learning algorithm how to make predictions.
dataset as “a collection of data that is treated as a single unit by a computer”. This means
that a dataset contains a lot of separate pieces of data but can be used to train an algorithm
with the goal of finding predictable patterns inside the whole dataset.
How to train the data?
-> AI training data will vary depending on whether you’re using supervised or unsupervised
learning. Unsupervised learning uses unlabeled data. Models are tasked with finding
patterns (or similarities and deviations) in the data to make inferences and reach conclusions.
With supervised learning, on the other hand, humans must tag, label, or annotate the data
to their criteria, in order to train the model to reach the desired conclusion (output). Labeled
data is shown in the examples above, where the desired outputs are predetermined.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 8
Problem Definition
 Short Message (SMS) and email has grown into a multi-billion dollars commercial
industry.
 SMS spam is still not as common as email spam.
 SMS Spam is showing growth, and in 2012 in parts of Asia up to 30% of text messages
was spam.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 9
Description of Dataset
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 10
Spam email percentage in the dataset = 12.63268156424581 %
Ham email percentage in the dataset = 87.37731843575419 %
The dataset consist of 5574 text message from UCI Machine learning repository
Methodology
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 11
Algorithms
 Different algorithms used for email spam detection:-
1. Deep learning
2. Naive Bayes
3. Support Vector Machines
4. K-Nearest Neighbour
5. Rough Sets
6. Random Forests
7. Multinomial naive
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 12
Classification of Algorithms (Naïve Bayes)
NB algorithm is applied to the final extracted features. The speed and simplicity along
with high accuracy of this algorithm makes it a desirable classifier for spam detection
problems. Applying naïve Bayes with multinomial event model to the dataset and using
10-fold cross validation results in Table 1.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 13
Word Cloud for Spam/Ham words
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 14
Top Spam Words Top Ham Words
Which email/SMS is generally longer ?
Here, we have calculated average word count for Ham Emails and Spam Emails
separately and then predicted which emails are generally longer.
Average Word Count for Ham Emails: 4516.00 words
Average Word Count for Spam Emails: 653.000 words
So, it can be concluded that, Ham emails are generally longer than spam emails.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 15
Spam Avg.
Ham Avg.
Home of application
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 16
Detecting spam messages
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 17
Detecting ham messages/email
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 18
Conclusion
Spam is a major problem in today's world. Spam messages are the most unwanted
messages the end user clients receive in our daily lives. Spam emails are available
nothing but an ad for any company, any kind of virus etc. It will be too much. It is easy
for hackers to access our system using these spam emails
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 19
References
https://www.isroset.org/journal/IJSRCSE/full_paper_view.php?paper_id=444
 https://www.freecodecamp.org/news/send-emailsusing-code-4fcea9df63f/
https://www.scribd.com/doc/61315817/IntraMailing-System
https://www.scribd.com/doc/19518895/Researchon-Mail-System-Project-report-for-Bachelor-inComputer-Rajendra-Man-
Banepali
 https://nevonprojects.com/email-client-project/
https://www.freeprojectz.com/python-djangoproject/mailing-system
 https://www.academia.edu/36614854/Spam_Filtering_Using_ML_Algorithms
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 20
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 21

Más contenido relacionado

La actualidad más candente

Leaky Bucket & Tocken Bucket - Traffic shaping
Leaky Bucket & Tocken Bucket - Traffic shapingLeaky Bucket & Tocken Bucket - Traffic shaping
Leaky Bucket & Tocken Bucket - Traffic shapingVimal Dewangan
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationAnkit Gupta
 
Python libraries for data science
Python libraries for data sciencePython libraries for data science
Python libraries for data sciencenilashri2
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural networkKIRAN R
 
final-spam-e-mail-detection-180125111231.pptx
final-spam-e-mail-detection-180125111231.pptxfinal-spam-e-mail-detection-180125111231.pptx
final-spam-e-mail-detection-180125111231.pptxinfotowards
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment AnalysisRebecca Williams
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using mlPravin Katiyar
 
Fake Image Identification
Fake Image IdentificationFake Image Identification
Fake Image IdentificationVenkat Projects
 
Machine learning seminar ppt
Machine learning seminar pptMachine learning seminar ppt
Machine learning seminar pptRAHUL DANGWAL
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learningdataalcott
 
Minor project Report for "Quiz Application"
Minor project Report for "Quiz Application"Minor project Report for "Quiz Application"
Minor project Report for "Quiz Application"Harsh Verma
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Simple Mail Transfer Protocol
Simple Mail Transfer ProtocolSimple Mail Transfer Protocol
Simple Mail Transfer ProtocolUjjayanta Bhaumik
 
Applications of Machine Learning
Applications of Machine LearningApplications of Machine Learning
Applications of Machine LearningHayim Makabee
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsKush Kulshrestha
 

La actualidad más candente (20)

Leaky Bucket & Tocken Bucket - Traffic shaping
Leaky Bucket & Tocken Bucket - Traffic shapingLeaky Bucket & Tocken Bucket - Traffic shaping
Leaky Bucket & Tocken Bucket - Traffic shaping
 
Machine Can Think
Machine Can ThinkMachine Can Think
Machine Can Think
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning Presentation
 
Python libraries for data science
Python libraries for data sciencePython libraries for data science
Python libraries for data science
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural network
 
final-spam-e-mail-detection-180125111231.pptx
final-spam-e-mail-detection-180125111231.pptxfinal-spam-e-mail-detection-180125111231.pptx
final-spam-e-mail-detection-180125111231.pptx
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Fake Image Identification
Fake Image IdentificationFake Image Identification
Fake Image Identification
 
Machine learning seminar ppt
Machine learning seminar pptMachine learning seminar ppt
Machine learning seminar ppt
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
 
Minor project Report for "Quiz Application"
Minor project Report for "Quiz Application"Minor project Report for "Quiz Application"
Minor project Report for "Quiz Application"
 
Text summarization
Text summarizationText summarization
Text summarization
 
Machine learning
Machine learningMachine learning
Machine learning
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Simple Mail Transfer Protocol
Simple Mail Transfer ProtocolSimple Mail Transfer Protocol
Simple Mail Transfer Protocol
 
Applications of Machine Learning
Applications of Machine LearningApplications of Machine Learning
Applications of Machine Learning
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 

Similar a Spam email detection using machine learning PPT.pptx

Classification with R
Classification with RClassification with R
Classification with RNajima Begum
 
Titles with Abstracts_2023-2024_Data Mining.pdf
Titles with Abstracts_2023-2024_Data Mining.pdfTitles with Abstracts_2023-2024_Data Mining.pdf
Titles with Abstracts_2023-2024_Data Mining.pdfinfo751436
 
YASH DATA SCIENCE SEMINAR.pptx
YASH DATA SCIENCE SEMINAR.pptxYASH DATA SCIENCE SEMINAR.pptx
YASH DATA SCIENCE SEMINAR.pptxYashShiva3
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesiaemedu
 
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
 
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
 
Lect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfLect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfHassanElalfy4
 
Anomalous symmetry succession for seek out
Anomalous symmetry succession for seek outAnomalous symmetry succession for seek out
Anomalous symmetry succession for seek outiaemedu
 
Email Spam Detection Using Machine Learning
Email Spam Detection Using Machine LearningEmail Spam Detection Using Machine Learning
Email Spam Detection Using Machine LearningIRJET Journal
 
Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...ijsrd.com
 
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailText Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailijsrd.com
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM cscpconf
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM csandit
 
IRJET - Fake News Detection using Machine Learning
IRJET -  	  Fake News Detection using Machine LearningIRJET -  	  Fake News Detection using Machine Learning
IRJET - Fake News Detection using Machine LearningIRJET Journal
 
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMEMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMIRJET Journal
 
Obfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmObfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmjournalBEEI
 
Obfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmObfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmjournalBEEI
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 

Similar a Spam email detection using machine learning PPT.pptx (20)

spam_msg_detection.pdf
spam_msg_detection.pdfspam_msg_detection.pdf
spam_msg_detection.pdf
 
Classification with R
Classification with RClassification with R
Classification with R
 
Titles with Abstracts_2023-2024_Data Mining.pdf
Titles with Abstracts_2023-2024_Data Mining.pdfTitles with Abstracts_2023-2024_Data Mining.pdf
Titles with Abstracts_2023-2024_Data Mining.pdf
 
YASH DATA SCIENCE SEMINAR.pptx
YASH DATA SCIENCE SEMINAR.pptxYASH DATA SCIENCE SEMINAR.pptx
YASH DATA SCIENCE SEMINAR.pptx
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniques
 
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
 
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
 
Lect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfLect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdf
 
Anomalous symmetry succession for seek out
Anomalous symmetry succession for seek outAnomalous symmetry succession for seek out
Anomalous symmetry succession for seek out
 
Email Spam Detection Using Machine Learning
Email Spam Detection Using Machine LearningEmail Spam Detection Using Machine Learning
Email Spam Detection Using Machine Learning
 
Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...
 
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailText Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
 
Eckovation Machine Learning
Eckovation Machine LearningEckovation Machine Learning
Eckovation Machine Learning
 
IRJET - Fake News Detection using Machine Learning
IRJET -  	  Fake News Detection using Machine LearningIRJET -  	  Fake News Detection using Machine Learning
IRJET - Fake News Detection using Machine Learning
 
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMEMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
 
Obfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmObfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithm
 
Obfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmObfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithm
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 

Último

Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 

Último (20)

Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 

Spam email detection using machine learning PPT.pptx

  • 1. CSMSS Chh. Shahu College of Engineering,Aurangabad Seminar On “Email & SMS Spam Detection” Guided By: Dr. S.V. Khidse Presenting By: Kunal kalamkar(3271) Department of Computer Science and Engineering 2021-22 1 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD
  • 2. Contents  Introduction  Technologies  Libraries  Machine Learning  Data set  Problem definition  Description of dataset  Methodology  Algorithms  Conclusion  References DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 2
  • 3. Introduction 3 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD In today’s globalized world, email is a primary source of communication. This communication can vary from personal, business, corporate to government. With the rapid increase in email usage, there has also been increase in the SPAM emails. SPAM emails, also known as junk email involves nearly identical messages sent to numerous recipients by email. Apart from being annoying, spam emails can also pose a security threat to computer system. It is estimated that spam cost businesses on the order of $100 billion in 2007. In this project, we use text mining to perform automatic spam filtering to use emails effectively. We try to identify patterns using Data-mining classification algorithms to enable us classify the emails as HAM or SPAM
  • 4. Technologies Technologies used :- 1. Python 2. HTML 3. CSS 4. JavaScript Python :- Python is an interpreted, object-oriented, high-level programming language with dynamic semantics developed by Guido van Rossum Libraries: 1. Numpy 2. Pandas 3. Sklearn 4. NLTK 4 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD
  • 5. Libraries NumPy:- NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Moreover, NumPy forms the foundation of the Machine Learning stack.  Pandas:- Pandas is one of the tools in Machine Learning which is used for data cleaning and analysis. It has features which are used for exploring, cleaning, transforming and visualizing from data  NLTK:- NLTK is intended to support research and teaching in NLP or closely related areas, including empirical linguistics, cognitive science, artificial intelligence, information retrieval, and machine learning. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 5
  • 6. Continued…  Matplotlib:- Matplotlib is a low-level library of Python which is used for data visualization. It is easy to use and emulates MATLAB like graphs and visualization. This library is built on the top of NumPy arrays and consist of several plots like line chart, bar chart, histogram, etc. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 6
  • 7. Machine Learning Arthur Samuel, an early American leader in the field of computer gaming and artificial intelligence, coined the term “Machine Learning ” in 1959 while at IBM. He defined machine learning as “the field of study that gives computers the ability to learn without being explicitly programmed “. • Machine learning is programming computers to optimize a performance criterion using example data or past experience . • The field of study known as machine learning is concerned with the question of how to construct computer programs that automatically improve with experience. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 7
  • 8. Data set A machine learning dataset is a collection of data that is used to train the model. A dataset acts as an example to teach the machine learning algorithm how to make predictions. dataset as “a collection of data that is treated as a single unit by a computer”. This means that a dataset contains a lot of separate pieces of data but can be used to train an algorithm with the goal of finding predictable patterns inside the whole dataset. How to train the data? -> AI training data will vary depending on whether you’re using supervised or unsupervised learning. Unsupervised learning uses unlabeled data. Models are tasked with finding patterns (or similarities and deviations) in the data to make inferences and reach conclusions. With supervised learning, on the other hand, humans must tag, label, or annotate the data to their criteria, in order to train the model to reach the desired conclusion (output). Labeled data is shown in the examples above, where the desired outputs are predetermined. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 8
  • 9. Problem Definition  Short Message (SMS) and email has grown into a multi-billion dollars commercial industry.  SMS spam is still not as common as email spam.  SMS Spam is showing growth, and in 2012 in parts of Asia up to 30% of text messages was spam. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 9
  • 10. Description of Dataset DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 10 Spam email percentage in the dataset = 12.63268156424581 % Ham email percentage in the dataset = 87.37731843575419 % The dataset consist of 5574 text message from UCI Machine learning repository
  • 11. Methodology DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 11
  • 12. Algorithms  Different algorithms used for email spam detection:- 1. Deep learning 2. Naive Bayes 3. Support Vector Machines 4. K-Nearest Neighbour 5. Rough Sets 6. Random Forests 7. Multinomial naive DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 12
  • 13. Classification of Algorithms (Naïve Bayes) NB algorithm is applied to the final extracted features. The speed and simplicity along with high accuracy of this algorithm makes it a desirable classifier for spam detection problems. Applying naïve Bayes with multinomial event model to the dataset and using 10-fold cross validation results in Table 1. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 13
  • 14. Word Cloud for Spam/Ham words DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 14 Top Spam Words Top Ham Words
  • 15. Which email/SMS is generally longer ? Here, we have calculated average word count for Ham Emails and Spam Emails separately and then predicted which emails are generally longer. Average Word Count for Ham Emails: 4516.00 words Average Word Count for Spam Emails: 653.000 words So, it can be concluded that, Ham emails are generally longer than spam emails. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 15 Spam Avg. Ham Avg.
  • 16. Home of application DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 16
  • 17. Detecting spam messages DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 17
  • 18. Detecting ham messages/email DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 18
  • 19. Conclusion Spam is a major problem in today's world. Spam messages are the most unwanted messages the end user clients receive in our daily lives. Spam emails are available nothing but an ad for any company, any kind of virus etc. It will be too much. It is easy for hackers to access our system using these spam emails DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 19
  • 21. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 21