SlideShare una empresa de Scribd logo
1 de 40
DETECTING THE
PRESENCE OF
CYBERBULLYING USING
COMPUTER SOFTWARE
Ashish Arora
Department of
Computer and Electrical
Engineering and
Computer Science
Florida Atlantic
University
Mentor: Dr. Taghi M.
Khoshgoftaar
WHAT IS CYBERBULLYING
?
The use of electronic
media or communication
channel to bully a person,
typically by sending
messages of an
intimidating or threatening
nature is known as
cyberbullying.
The Technology is used to
intentionally hurt or
embarrass another person.
It involves the use of
information and
communication
technologies to support
COMMENTS INVOLVING
NEGATIVITY AND
PROFANITY
Cyberbullying
Profanity Negativity
Sexuality
Race/Cult
ure
Intelligence
Physical
Attributes
ISSUES RELATED TO
CYBERBULLYING
Classifying the conversation in to normal chat/text or
under bullying attributes.
Cyberbullying is one of the most mentally damaging
problems on internet.
It results in catastrophic impact on self-esteem and
personal lives especially of students.
The Data needs to be categorized properly before using
any approach to stop the Cyberbullying activity.
WHAT IS THE SUITABLE
SOLUTION ?
Machine Learning Online
Patrol Crawler
Sentiment Analysis
Softwares to detect
cyberbullying content
MACHINE LEARNING
METHOD -ONLINE
PATROL CRAWLER
 This method is designed to curb the issue of online
malicious entries especially on Informal School websites
This method uses a machine learning method known as
Support Vector Method(SVM) to detect any inappropriate
entry.
 The software is Designed for automatically detecting
the cyberbullying cases
The data for classification purpose is taken from
Informal School websites.
These informal school websites contains Slandering
information about teachers and students
PREVIOUS APPROACH
1.Detection of Cyberbullying activity
2.Saving the URL of website
3. Printing out websites containing
cyberbullying entry
Sending deletion request of the
suspicious entry to the website
admin or internet provider.
Informing the police or legal
affair bureau
Confirming the deletion of
the entry containing Cyber-
Bullying activity
MACHINE LEARNING
APPROACH
Machine
Learning Module
Training Phase Test Phase
TRAINING PHASE STEPS
Crawling School Website
Detecting Manually Cyber-bullying entries
Extraction of vulgar words and adding them to lexicon
Estimating word similarity with Levenshtein distance
Training with Support Vector Machine Algorithm
TEST PHASE STEPS
Crawling School Website
Detecting Cyber-bullying entries by SVM model
Part of speech analysis of the detected harmful entry
Estimating word similarity with Levenshtein distance
Marking and visualizing harmful entries
ESTIMATION OF WORD
SIMILARITY-LEVENSHTEIN
DISTANCE
Manually gathered suspicious entries to form a lexicon of vulgar
words distinctive for cyberbullying entries.
Users often change spelling of words and write in an un-
normalized behaviour. E.g. ‘ See You’ is written as ‘CU’ in chat or
forums
Using Levenshtein Distance to calculate similarity of words used in
chat.
The Levenshtein Distance between two strings is calculated as the
minimum number of operations required to transform one string in
to another, where the available operations are only deletion,
insertion or substitution of a single character
For example, the Levenshtein distance between "kitten" and
"sitting" is 3, since the following three edits change one into the
other, and there is no way to do it with fewer than three edits:
kitten → sitten (substitution of "s" for "k")
sitten → sittin (substitution of "i" for "e")
sittin → sitting (insertion of "g" at the end).
SUPPORT VECTOR
MACHINE METHOD OF
CLASSIFICATION
SVM is a method of supervised machine learning which is
used for classification of data
With a set of training samples, divided into two categories
A and B, SVM training algorithm generates a model for
prediction of whether test samples belong to either category
A or B. To classify the entries into harmful(Cyberbullying) or
Non harmful.
Samples are represented as points in space (vectors).
SVM constructs a hyperplane in a space with largest
distance to the nearest training data points.
The larger the margin the lower the generalization error of
the classifier
Training samples divided in to two categories.
Samples are represented as points in space.
EVALUATION OF SVM
MODEL
Data needs to be prepared for training the SVM model.
For training data 966 entries were gathered during
manual online patrol , from which human annotators
classified 750 entries as harmful and 216 as non-
harmful.
The above entries were applied to SVM_light a software
to implement SVM algorithm.
The result is represented in terms of F-Score where F-
Score is represented in terms of Precision and Recall.
METHODOLOGY
Traini
ng
Data
set
966
750
harmf
ul
216
not
harmf
ul
SVM light (a
software for
building SVM
Models)SVM
training
10-fold
cross
validatio
n
Result of
SVM model
79.9% of
Precision
and
98.3%of
Recall.Test Data
Set
Evaluate
Pre-
processing
Feature
Extraction
D
A
T
A
RANKING THE WORDS
Apart from the classification of cyber-bullying entries,
there is a need to appropriately determine how harmful
is a certain entry Harmfulness of an entry is calculated
using T-score
To calculate the harmfulness of the whole entry of
words ,a sum of T scores is calculated for all vulgar
words. The higher occurrence frequency a word has in a
sentence, the higher is the value of T-score
The more frequently occurring words there are in the
entry, the higher rank the entry achieves in the ranking
of harmfulness.
T-score = a/b
DISCUSSION
 The results of SVM model used to distinguish between
harmful and non-harmful information were 79.9% of
Precision and 98.3% of Recall.
This approach is not as accurate for preparing lexicon
of vulgar words , the words being matched by
Levenshtein distance sometimes does not give accurate
results.
 New vulgar words appearing frequently , need to find a
way to automatically extract new harmful words from
internet automatically.
DETECTING
CYBERBULLYING ON
SOCIAL NETWORK SITE –
TWITTER
Sentiment Classifier is used to classify tweets in to
negative and positive categories by using Machine
Learning Algorithm
The aims is to determine the bullying instances in social
networks and increase their visibility.
Twitter is used as the Source of data.
PREVIOUS APPROACH
Machine Learning Algorithm for classifying the
sentiment of twitter messages.
Previous approach classified tweets in to positive or
negative with respect to specific emoticons found in
twitter messages.
In this approach instead of emoticons commonly used
abuse words are used for labelling.
Graph visualizations, both dynamic and static, to
illustrate clustering of bullies over a period .
PROPOSED APPROACH
This software application would be capable of accurately
classifying Twitter messages as negative or positive with
respect to some commonly used terms .
Mainly Focussed on Gender Bullying by using four words
with different Polarity.
To confirm their “bullying” polarity, Amazon’s
Mechanical Turk was used.
PROPOSED APPROACH
Once polarity of words is confirmed, data would be
processed to extract some relevant information, such as
the username of the person who posted the negative
tweet (potential bully) and the username of the person
mentioned in the tweet.
The outcome of the monitoring process will be several
social graphs.
The Social Graphs will be categorized in to bully and
victim Social Graph.
The purpose of this graph is to visualize all detected
bullying instances, find clusters of bullies, and show
hidden connections between victims over a period of
time.
TECHNOLOGY USED
LingPipe – A tool kit for processing text using
computational linguistics. Implements naïve Bayes
algorithm.
Tweet Extractor – To extract tweets from twitter
continuously.
Gephi – Open Source Graph Visualization and
manipulation software
Amazon’s Mechanical Turk Service – Crowdsourcing
Market place , coordinate the use of human intelligence
to perform tasks that computers are unable to do.
DATA COLLECTION AND
PRE-PROCESSING
Tweets were collected from different sources , around
5000 tweets.
Use of Bag-of-words model. It takes every word in a
sentence as features , the whole sentence is represented
by an unordered collection of words.
5000 tweets
Previously collected
data from Stanford
students
Previously collected
data from university
professors
Used Mechanical
Turk to validate the
polarity of tweets
APPROACH
Built a framework on top of LingPipe tool kit for
processing text using computational linguistics
Framework uses LingPipe’s Naive Bayes machine
learning classifier as baseline
Framework treats the classifier and feature extractor as
one component
As part of data collection and pre-processing, accessed
Twitter looking for the tweets containing the words of
interest (negative words)
Framework Ling Pipe
Naïve Bayes Classifier
+ Tweet Extractor
Extracts
tweets
DATA COLLECTION
Open Source Library
and Streaming API
Crawls twitter
timeline
Tweets
containing
Words of
interest
For training
data
For training data, messages that contained the
words “Gay,” “Homo,” “Dike,” and “Queer” were
collected by using our in-house Tweets extractor.
The Test Data was collected at random by
streaming in public tweets from twitter’s public
timeline.
To train classifier created a training data set and a test
data set.
Training data consists of messages containing 4 words
of interest –’Gay’, ‘Homo’, ’Dike’ and ‘Queer’
5000 tweets – Approximately 3/4 of the collected
tweets were negative and 1/4 is positive tweets..
Manually labelled 460 tweets as negative and 500
tweets were labelled positive by Amazon’s Mechanical
Turk Service
The labelled data is being validated by selecting a
random sample of the collected data and use Amazon’s
mechanical Turk to confirm their sentiment.
Survey Used
Opinion Polarity Value
Negative with Bullying
Intentions
B
Negative without Bullying
Intentions
A
Positive or good content P
Neutral N
CLASSIFICATION – NAIVE
BAYES CLASSIFIER
The Focus of this approach is to find polarity of tweets.
Each word in a tweet considered unique variable in
Naïve Bayes model.
Goal – Probability of word whether it belongs to positive
or negative class
Collecting
Data set for
training
Pre
processing
Data Set
Training
Data
Training the
model
Sentime
nt
Detectio
n(Positiv
e ,
Negativ
e)
RESULTS
Amazon’s Mechanical Turk classified unlabelled data
which was used to verify and validate newly labelled data
provided by Machine Learning Algorithm.
Results
Training 500 Tweets
Positive Negative Accuracy
Naïve Bayes 65.7% 72.9% 67.3%
Amazon’s
Mturk
65.2% 74.0% 67.1%
CONCLUSION
This approach leverages the power of sentiment
analysis.
The classifier was close to 70% accurate.
 It is not the best result as expected due to restriction
from accessing unlimited content from twitter.
CYBERBULLYING
BLOCKER APPLICATION
FOR ANDROID
New types of devices connected to internet such as
smartphones and tablets further exacerbated the
problem of cyberbullying.
Android Application which automatically detects a
possible harmful content in a text.
This application uses machine learning method to spot
any undesirable content
APPLICATION
Application is built for devices supporting Android OS.
Java8 and Android Studio was used.
Gives users interface for detection of harmful contents.
HARMFUL CONTENT
DETECTION PROCESS
The Application contains one activity responsible for
interacting with the user.
For the process of checking harmful content the
application starts a background thread.
The user can still use the device even if checking process
takes a while.
User Inputs
text on
mobile
screen
Push Button to
select the
method
Feedback to the
user
METHODOLOGY
The method classifies messages as harmful or not by
using a classifier trained with language modelling
method based on Brute Force Algo.
Brute Force - Algorithms using combinatorial approach
usually generate a massive number of combinations -
potential answers to a given problem.
Algorithm applied for automatic extraction of sentence
patterns
Actual data collected by Internet Patrol (annotated by
experts)
1490 harmful and 1508 non-harmful entries.
All patterns used in classification was stored on mobile
device.
Method operates locally does not require internet
connection.
METHOD CONT.…
RESULT
Precision = 79 %
Recall = 79 %
Requires minimal human effort
RECALL is the ratio of the number of relevant records
retrieved to the total number of relevant records in the
database.
PRECISION is the ratio of the number of relevant records
retrieved to the total number of irrelevant and relevant
records retrieved
OTHER SOFTWARE
PRODUCTS IN THE MARKET
FOR DETECTING
CYBERBULLYING
FearNot ! – Interactive drama/video game that teaches
children strategies to prevent bullying and social
exclusion.
Samaritans Radar – The application function was alerting
a user when it spotted someone of either being bullied,
depressed or sending disturbing suicidal signals.
Application was stopped due to privacy concerns.
ReThink – This is a smartphone application which shows
a pop-up warning message when user tries to send a
message having harmful content.
PocketGuardian – It’s a parental monitoring App which
detects not only cyberbullying texting but also harmful
images. It uses machine learning algorithm.
Disadvantage – Costs $4 per month.
PROPOSAL TO FILTER
SUSPECTED MESSAGES
A filtering mechanism to classify messages as “abusive”
or “non-abusive”(or“positive”and“negative,”) respectively.
In a practical system, the filter will not be completely
reliable; there will be false positives and false negatives
in at least some cases.
Some cases likes of threats requires extra efforts.
Difficult to create an automated system to reliably
recognize threats that should be reported to the police.
The problem of false positives and the problem of
discarding threats can both be dampened by diverting
messages labelled abusive to a trusted third party.
EXAMPLE OF FILTERING
SYSTEM
CHALLENGES
Preventing the removal of valuable messages when
attempting to filter the data.
Privacy concerns
Incidents should be reported as early as possible.
False reporting
Detect Cyberbullying Using ML and Sentiment Analysis

Más contenido relacionado

La actualidad más candente

Graphical password authentication
Graphical password authenticationGraphical password authentication
Graphical password authenticationAsim Kumar Pathak
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learningSandeep Garg
 
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...Jowin John Chemban
 
Spam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptxSpam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptxKunal Kalamkar
 
Mind reading computer ppt
Mind reading computer pptMind reading computer ppt
Mind reading computer pptTarun tyagi
 
Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksAshish Arora
 
Smart Voting System with Face Recognition
Smart Voting System with Face RecognitionSmart Voting System with Face Recognition
Smart Voting System with Face RecognitionNikhil Katte
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detectionvineeta vineeta
 
Spammer detection and fake user Identification on Social Networks
Spammer detection and fake user Identification on Social NetworksSpammer detection and fake user Identification on Social Networks
Spammer detection and fake user Identification on Social NetworksJAYAPRAKASH JPINFOTECH
 
Cyber crime ppt
Cyber crime pptCyber crime ppt
Cyber crime pptMOE515253
 
Full seminar report on ethical hacking
Full seminar report on ethical hackingFull seminar report on ethical hacking
Full seminar report on ethical hackingGeorgekutty Francis
 
Face detection presentation slide
Face detection  presentation slideFace detection  presentation slide
Face detection presentation slideSanjoy Dutta
 
Ppt on use of biomatrix in secure e trasaction
Ppt on use of biomatrix in secure e trasactionPpt on use of biomatrix in secure e trasaction
Ppt on use of biomatrix in secure e trasactionDevyani Vaidya
 
Sentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine LearningSentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine LearningNihar Suryawanshi
 
cyber security legal perspective
cyber security legal perspectivecyber security legal perspective
cyber security legal perspectiveShoeb Ahmed
 

La actualidad más candente (20)

Graphical password authentication
Graphical password authenticationGraphical password authentication
Graphical password authentication
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
Seminar Presentation | Network Intrusion Detection using Supervised Machine L...
 
Spam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptxSpam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptx
 
Mind reading computer ppt
Mind reading computer pptMind reading computer ppt
Mind reading computer ppt
 
Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social Networks
 
Smart Voting System with Face Recognition
Smart Voting System with Face RecognitionSmart Voting System with Face Recognition
Smart Voting System with Face Recognition
 
Spamming
SpammingSpamming
Spamming
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
Spammer detection and fake user Identification on Social Networks
Spammer detection and fake user Identification on Social NetworksSpammer detection and fake user Identification on Social Networks
Spammer detection and fake user Identification on Social Networks
 
Finger reader
Finger readerFinger reader
Finger reader
 
Cyber crime ppt
Cyber crime pptCyber crime ppt
Cyber crime ppt
 
Phishing ppt
Phishing pptPhishing ppt
Phishing ppt
 
Full seminar report on ethical hacking
Full seminar report on ethical hackingFull seminar report on ethical hacking
Full seminar report on ethical hacking
 
Face detection presentation slide
Face detection  presentation slideFace detection  presentation slide
Face detection presentation slide
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Ppt on use of biomatrix in secure e trasaction
Ppt on use of biomatrix in secure e trasactionPpt on use of biomatrix in secure e trasaction
Ppt on use of biomatrix in secure e trasaction
 
Rain technology
Rain technologyRain technology
Rain technology
 
Sentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine LearningSentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine Learning
 
cyber security legal perspective
cyber security legal perspectivecyber security legal perspective
cyber security legal perspective
 

Destacado

*Cyber bullying presentation*
*Cyber bullying presentation**Cyber bullying presentation*
*Cyber bullying presentation*Amber Dee
 
Cyberbullying powerpoint
Cyberbullying powerpointCyberbullying powerpoint
Cyberbullying powerpointjosiebrookeday
 
June virtue of idealism [autosaved]
June virtue of idealism [autosaved]June virtue of idealism [autosaved]
June virtue of idealism [autosaved]acampbell
 
Cyberbullying project
Cyberbullying projectCyberbullying project
Cyberbullying projectJoannaNieves
 
Cyber bullying slide share
Cyber bullying slide shareCyber bullying slide share
Cyber bullying slide sharebr03wood
 
Cyber bullying powerpoint
Cyber bullying powerpointCyber bullying powerpoint
Cyber bullying powerpointshannonmf
 

Destacado (9)

*Cyber bullying presentation*
*Cyber bullying presentation**Cyber bullying presentation*
*Cyber bullying presentation*
 
Cyberbullying Presentation
Cyberbullying PresentationCyberbullying Presentation
Cyberbullying Presentation
 
Cyberbullying powerpoint
Cyberbullying powerpointCyberbullying powerpoint
Cyberbullying powerpoint
 
June virtue of idealism [autosaved]
June virtue of idealism [autosaved]June virtue of idealism [autosaved]
June virtue of idealism [autosaved]
 
Cyberbullying for Teachers
Cyberbullying for TeachersCyberbullying for Teachers
Cyberbullying for Teachers
 
Cyberbullying project
Cyberbullying projectCyberbullying project
Cyberbullying project
 
Cyber bullying slide share
Cyber bullying slide shareCyber bullying slide share
Cyber bullying slide share
 
Cyber bullying powerpoint
Cyber bullying powerpointCyber bullying powerpoint
Cyber bullying powerpoint
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Similar a Detect Cyberbullying Using ML and Sentiment Analysis

Categorize balanced dataset for troll detection
Categorize balanced dataset for troll detectionCategorize balanced dataset for troll detection
Categorize balanced dataset for troll detectionvivatechijri
 
A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...IRJET Journal
 
A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...IRJET Journal
 
Svm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsSvm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsS M Raju
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine LearningIRJET Journal
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using mlPravin Katiyar
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection ModelIRJET Journal
 
Classification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social NetworkClassification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social NetworkIRJET Journal
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis reportSavio Aberneithie
 
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...IRJET Journal
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataHari Prasad
 
sentimentanaly 2.pdf
sentimentanaly 2.pdfsentimentanaly 2.pdf
sentimentanaly 2.pdfvisheshs4
 
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...IRJET Journal
 
Supervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmSupervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmIJSRD
 
Derogatory Comment Classification
Derogatory Comment ClassificationDerogatory Comment Classification
Derogatory Comment ClassificationIRJET Journal
 
Detecting cyberbullying text using the approaches with machine learning model...
Detecting cyberbullying text using the approaches with machine learning model...Detecting cyberbullying text using the approaches with machine learning model...
Detecting cyberbullying text using the approaches with machine learning model...IAESIJAI
 

Similar a Detect Cyberbullying Using ML and Sentiment Analysis (20)

Categorize balanced dataset for troll detection
Categorize balanced dataset for troll detectionCategorize balanced dataset for troll detection
Categorize balanced dataset for troll detection
 
Abstract
AbstractAbstract
Abstract
 
A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...
 
A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...A study of cyberbullying detection using Deep Learning and Machine Learning T...
A study of cyberbullying detection using Deep Learning and Machine Learning T...
 
Svm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsSvm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweets
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine Learning
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection Model
 
Classification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social NetworkClassification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social Network
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
 
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...IRJET-  	  An Effective Analysis of Anti Troll System using Artificial Intell...
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...
 
Eckovation Machine Learning
Eckovation Machine LearningEckovation Machine Learning
Eckovation Machine Learning
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter Data
 
sentimentanaly 2.pdf
sentimentanaly 2.pdfsentimentanaly 2.pdf
sentimentanaly 2.pdf
 
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
 
Supervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmSupervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithm
 
Derogatory Comment Classification
Derogatory Comment ClassificationDerogatory Comment Classification
Derogatory Comment Classification
 
Detecting cyberbullying text using the approaches with machine learning model...
Detecting cyberbullying text using the approaches with machine learning model...Detecting cyberbullying text using the approaches with machine learning model...
Detecting cyberbullying text using the approaches with machine learning model...
 
ashu ppt final.pptx
ashu ppt final.pptxashu ppt final.pptx
ashu ppt final.pptx
 
Machine learning
Machine learningMachine learning
Machine learning
 

Último

Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 

Último (20)

Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 

Detect Cyberbullying Using ML and Sentiment Analysis

  • 1. DETECTING THE PRESENCE OF CYBERBULLYING USING COMPUTER SOFTWARE Ashish Arora Department of Computer and Electrical Engineering and Computer Science Florida Atlantic University Mentor: Dr. Taghi M. Khoshgoftaar
  • 2. WHAT IS CYBERBULLYING ? The use of electronic media or communication channel to bully a person, typically by sending messages of an intimidating or threatening nature is known as cyberbullying. The Technology is used to intentionally hurt or embarrass another person. It involves the use of information and communication technologies to support
  • 3. COMMENTS INVOLVING NEGATIVITY AND PROFANITY Cyberbullying Profanity Negativity Sexuality Race/Cult ure Intelligence Physical Attributes
  • 4. ISSUES RELATED TO CYBERBULLYING Classifying the conversation in to normal chat/text or under bullying attributes. Cyberbullying is one of the most mentally damaging problems on internet. It results in catastrophic impact on self-esteem and personal lives especially of students. The Data needs to be categorized properly before using any approach to stop the Cyberbullying activity.
  • 5. WHAT IS THE SUITABLE SOLUTION ? Machine Learning Online Patrol Crawler Sentiment Analysis Softwares to detect cyberbullying content
  • 6. MACHINE LEARNING METHOD -ONLINE PATROL CRAWLER  This method is designed to curb the issue of online malicious entries especially on Informal School websites This method uses a machine learning method known as Support Vector Method(SVM) to detect any inappropriate entry.  The software is Designed for automatically detecting the cyberbullying cases The data for classification purpose is taken from Informal School websites. These informal school websites contains Slandering information about teachers and students
  • 7. PREVIOUS APPROACH 1.Detection of Cyberbullying activity 2.Saving the URL of website 3. Printing out websites containing cyberbullying entry Sending deletion request of the suspicious entry to the website admin or internet provider. Informing the police or legal affair bureau Confirming the deletion of the entry containing Cyber- Bullying activity
  • 9. TRAINING PHASE STEPS Crawling School Website Detecting Manually Cyber-bullying entries Extraction of vulgar words and adding them to lexicon Estimating word similarity with Levenshtein distance Training with Support Vector Machine Algorithm
  • 10. TEST PHASE STEPS Crawling School Website Detecting Cyber-bullying entries by SVM model Part of speech analysis of the detected harmful entry Estimating word similarity with Levenshtein distance Marking and visualizing harmful entries
  • 11. ESTIMATION OF WORD SIMILARITY-LEVENSHTEIN DISTANCE Manually gathered suspicious entries to form a lexicon of vulgar words distinctive for cyberbullying entries. Users often change spelling of words and write in an un- normalized behaviour. E.g. ‘ See You’ is written as ‘CU’ in chat or forums Using Levenshtein Distance to calculate similarity of words used in chat. The Levenshtein Distance between two strings is calculated as the minimum number of operations required to transform one string in to another, where the available operations are only deletion, insertion or substitution of a single character For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits: kitten → sitten (substitution of "s" for "k") sitten → sittin (substitution of "i" for "e") sittin → sitting (insertion of "g" at the end).
  • 12. SUPPORT VECTOR MACHINE METHOD OF CLASSIFICATION SVM is a method of supervised machine learning which is used for classification of data With a set of training samples, divided into two categories A and B, SVM training algorithm generates a model for prediction of whether test samples belong to either category A or B. To classify the entries into harmful(Cyberbullying) or Non harmful. Samples are represented as points in space (vectors). SVM constructs a hyperplane in a space with largest distance to the nearest training data points. The larger the margin the lower the generalization error of the classifier Training samples divided in to two categories. Samples are represented as points in space.
  • 13. EVALUATION OF SVM MODEL Data needs to be prepared for training the SVM model. For training data 966 entries were gathered during manual online patrol , from which human annotators classified 750 entries as harmful and 216 as non- harmful. The above entries were applied to SVM_light a software to implement SVM algorithm. The result is represented in terms of F-Score where F- Score is represented in terms of Precision and Recall.
  • 14. METHODOLOGY Traini ng Data set 966 750 harmf ul 216 not harmf ul SVM light (a software for building SVM Models)SVM training 10-fold cross validatio n Result of SVM model 79.9% of Precision and 98.3%of Recall.Test Data Set Evaluate Pre- processing Feature Extraction D A T A
  • 15. RANKING THE WORDS Apart from the classification of cyber-bullying entries, there is a need to appropriately determine how harmful is a certain entry Harmfulness of an entry is calculated using T-score To calculate the harmfulness of the whole entry of words ,a sum of T scores is calculated for all vulgar words. The higher occurrence frequency a word has in a sentence, the higher is the value of T-score The more frequently occurring words there are in the entry, the higher rank the entry achieves in the ranking of harmfulness. T-score = a/b
  • 16. DISCUSSION  The results of SVM model used to distinguish between harmful and non-harmful information were 79.9% of Precision and 98.3% of Recall. This approach is not as accurate for preparing lexicon of vulgar words , the words being matched by Levenshtein distance sometimes does not give accurate results.  New vulgar words appearing frequently , need to find a way to automatically extract new harmful words from internet automatically.
  • 17. DETECTING CYBERBULLYING ON SOCIAL NETWORK SITE – TWITTER Sentiment Classifier is used to classify tweets in to negative and positive categories by using Machine Learning Algorithm The aims is to determine the bullying instances in social networks and increase their visibility. Twitter is used as the Source of data.
  • 18. PREVIOUS APPROACH Machine Learning Algorithm for classifying the sentiment of twitter messages. Previous approach classified tweets in to positive or negative with respect to specific emoticons found in twitter messages. In this approach instead of emoticons commonly used abuse words are used for labelling. Graph visualizations, both dynamic and static, to illustrate clustering of bullies over a period .
  • 19. PROPOSED APPROACH This software application would be capable of accurately classifying Twitter messages as negative or positive with respect to some commonly used terms . Mainly Focussed on Gender Bullying by using four words with different Polarity. To confirm their “bullying” polarity, Amazon’s Mechanical Turk was used.
  • 20. PROPOSED APPROACH Once polarity of words is confirmed, data would be processed to extract some relevant information, such as the username of the person who posted the negative tweet (potential bully) and the username of the person mentioned in the tweet. The outcome of the monitoring process will be several social graphs. The Social Graphs will be categorized in to bully and victim Social Graph. The purpose of this graph is to visualize all detected bullying instances, find clusters of bullies, and show hidden connections between victims over a period of time.
  • 21. TECHNOLOGY USED LingPipe – A tool kit for processing text using computational linguistics. Implements naïve Bayes algorithm. Tweet Extractor – To extract tweets from twitter continuously. Gephi – Open Source Graph Visualization and manipulation software Amazon’s Mechanical Turk Service – Crowdsourcing Market place , coordinate the use of human intelligence to perform tasks that computers are unable to do.
  • 22. DATA COLLECTION AND PRE-PROCESSING Tweets were collected from different sources , around 5000 tweets. Use of Bag-of-words model. It takes every word in a sentence as features , the whole sentence is represented by an unordered collection of words. 5000 tweets Previously collected data from Stanford students Previously collected data from university professors Used Mechanical Turk to validate the polarity of tweets
  • 23. APPROACH Built a framework on top of LingPipe tool kit for processing text using computational linguistics Framework uses LingPipe’s Naive Bayes machine learning classifier as baseline Framework treats the classifier and feature extractor as one component As part of data collection and pre-processing, accessed Twitter looking for the tweets containing the words of interest (negative words) Framework Ling Pipe Naïve Bayes Classifier + Tweet Extractor Extracts tweets
  • 24. DATA COLLECTION Open Source Library and Streaming API Crawls twitter timeline Tweets containing Words of interest For training data For training data, messages that contained the words “Gay,” “Homo,” “Dike,” and “Queer” were collected by using our in-house Tweets extractor. The Test Data was collected at random by streaming in public tweets from twitter’s public timeline.
  • 25. To train classifier created a training data set and a test data set. Training data consists of messages containing 4 words of interest –’Gay’, ‘Homo’, ’Dike’ and ‘Queer’ 5000 tweets – Approximately 3/4 of the collected tweets were negative and 1/4 is positive tweets.. Manually labelled 460 tweets as negative and 500 tweets were labelled positive by Amazon’s Mechanical Turk Service The labelled data is being validated by selecting a random sample of the collected data and use Amazon’s mechanical Turk to confirm their sentiment. Survey Used Opinion Polarity Value Negative with Bullying Intentions B Negative without Bullying Intentions A Positive or good content P Neutral N
  • 26. CLASSIFICATION – NAIVE BAYES CLASSIFIER The Focus of this approach is to find polarity of tweets. Each word in a tweet considered unique variable in Naïve Bayes model. Goal – Probability of word whether it belongs to positive or negative class Collecting Data set for training Pre processing Data Set Training Data Training the model Sentime nt Detectio n(Positiv e , Negativ e)
  • 27. RESULTS Amazon’s Mechanical Turk classified unlabelled data which was used to verify and validate newly labelled data provided by Machine Learning Algorithm. Results Training 500 Tweets Positive Negative Accuracy Naïve Bayes 65.7% 72.9% 67.3% Amazon’s Mturk 65.2% 74.0% 67.1%
  • 28. CONCLUSION This approach leverages the power of sentiment analysis. The classifier was close to 70% accurate.  It is not the best result as expected due to restriction from accessing unlimited content from twitter.
  • 29. CYBERBULLYING BLOCKER APPLICATION FOR ANDROID New types of devices connected to internet such as smartphones and tablets further exacerbated the problem of cyberbullying. Android Application which automatically detects a possible harmful content in a text. This application uses machine learning method to spot any undesirable content
  • 30. APPLICATION Application is built for devices supporting Android OS. Java8 and Android Studio was used. Gives users interface for detection of harmful contents.
  • 31. HARMFUL CONTENT DETECTION PROCESS The Application contains one activity responsible for interacting with the user. For the process of checking harmful content the application starts a background thread. The user can still use the device even if checking process takes a while. User Inputs text on mobile screen Push Button to select the method Feedback to the user
  • 32. METHODOLOGY The method classifies messages as harmful or not by using a classifier trained with language modelling method based on Brute Force Algo. Brute Force - Algorithms using combinatorial approach usually generate a massive number of combinations - potential answers to a given problem. Algorithm applied for automatic extraction of sentence patterns Actual data collected by Internet Patrol (annotated by experts) 1490 harmful and 1508 non-harmful entries. All patterns used in classification was stored on mobile device. Method operates locally does not require internet connection.
  • 34.
  • 35. RESULT Precision = 79 % Recall = 79 % Requires minimal human effort RECALL is the ratio of the number of relevant records retrieved to the total number of relevant records in the database. PRECISION is the ratio of the number of relevant records retrieved to the total number of irrelevant and relevant records retrieved
  • 36. OTHER SOFTWARE PRODUCTS IN THE MARKET FOR DETECTING CYBERBULLYING FearNot ! – Interactive drama/video game that teaches children strategies to prevent bullying and social exclusion. Samaritans Radar – The application function was alerting a user when it spotted someone of either being bullied, depressed or sending disturbing suicidal signals. Application was stopped due to privacy concerns. ReThink – This is a smartphone application which shows a pop-up warning message when user tries to send a message having harmful content. PocketGuardian – It’s a parental monitoring App which detects not only cyberbullying texting but also harmful images. It uses machine learning algorithm. Disadvantage – Costs $4 per month.
  • 37. PROPOSAL TO FILTER SUSPECTED MESSAGES A filtering mechanism to classify messages as “abusive” or “non-abusive”(or“positive”and“negative,”) respectively. In a practical system, the filter will not be completely reliable; there will be false positives and false negatives in at least some cases. Some cases likes of threats requires extra efforts. Difficult to create an automated system to reliably recognize threats that should be reported to the police. The problem of false positives and the problem of discarding threats can both be dampened by diverting messages labelled abusive to a trusted third party.
  • 39. CHALLENGES Preventing the removal of valuable messages when attempting to filter the data. Privacy concerns Incidents should be reported as early as possible. False reporting

Notas del editor

  1. The Levenshtein Distance between two strings is calculated as the minimum number of operations required to transform one string in to another, where the available operations are only deletion, insertion or substitution of a single character
  2. A ranking according to the harmfulness of entries is important to detect the most dangerous cases. In our approach an entry is considered as the more harmful, the more vulgar keywords appear in the entry.
  3. Alice is the user whose messages are to be filtered; Bob is a third party Alice’s trusts who has agreed to help in filtering the messages. A customizable whitelist/blacklist pair is used to pre filter messages; a filter which uses a learning mechanism to adapt to feedback about incorrect classifications classifies the remaining messages as “abusive” or “non-abusive,” diverting the former to Bob for examination and delivering the latter to Alice normally.