SlideShare una empresa de Scribd logo
1 de 3
Descargar para leer sin conexión
IJSRD - International Journal for Scientific Research & Development| Vol. 3, Issue 10, 2015 | ISSN (online): 2321-0613
All rights reserved by www.ijsrd.com 1112
A Survey on Data Mining Techniques for Crime Hotspots Prediction
Neha Patel1 Prof. Shivani V. Vora2
1
P.G. Student 2
Assistant Professor
1,2
Department of Computer Engineering
1,2
CGPIT, UkaTarsadiya University, Mahuva, Surat, Gujarat, India.
Abstract— A crime is an act which is against the laws of a
country or region. The technique which is used to find areas
on a map which have high crime intensity is known as crime
hotspot prediction. The technique uses the crime data which
includes the area with crime rate and predict the future
location with high crime intensity. The motivation of crime
hotspot prediction is to raise people’s awareness regarding
the dangerous location in certain time period. It can help for
police resource allocation for creating a safe environment.
The paper presents survey of different types of data mining
techniques for crime hotspots prediction.
Key words: Data mining; crime hotspot; prediction;
accuracy
I. INTRODUCTION
Crime is a social problem that affecting the people’s life and
economic development of a society. The prediction of crime
is difficult [1]. The occurrence of crime is related to a
variety of socio-economic and crime opportunity factors,
like population, economic investment and arrest rate.
Crime usually has spatial and temporal
characteristics related to the population, environment,
economic factors, politics, and social events. The victims of
crime may not be predicted but the place that has probability
of an occurrence of crime it may be predicted. Analysis of
large amount of crime data is a difficult task. Data mining is
a useful process to handle amount of data and to reduce the
manpower. The most effective spatial-temporal analysis for
the understanding of the contained relationships among
crime events is the analysis of crime hot spots. The analysis
of crime hot spots contributes to the conflict and prevention
of crimes by allowing the planning of strategies that
optimize the distribution of police resources. As police
resources are limited in some areas, the planning of such
allocation becomes an important task. Two type of crime
hotspots are there: crime general and crime specific. Wide
range of crime types are occur in a particular area then is
known as crime general hotspot. One or several types of
crimes are occur in a particular area then it is called as a
crime specific hotspot. The crime hotspots prediction is
done using crime location, crime time and which type of
crime is done. We can also find the area for specific crime.
The main motivation of crime hotspots prediction is to
create a safe environment and for police resource allocation.
In the next section-II the general steps for crime
hotspots prediction and classification techniques are
introduced. Where five classification techniques are
described. Section-III represents the comparative analysis of
three classification techniques which are support vector
machine, decision tree and naïve bays.
II. LITERATURE SURVEY
The general steps for crime hotspot prediction are:
1)Data collection
2)Preprocessing
3)Feature selection
4)Classification
5)Prediction
6)Visualization
A. Data Collection
In data collection step crime data is collected from different
sources like news sites, blogs, RSS feed and mainly from
police records. For unstructured data Mongo database is
used [2].
B. Preprocessing
There are few techniques for data preprocessing. This
techniques are data cleaning, reduction, integration,
discretization, transformation and feature selection. It
intends to reduce some noises, incomplete and in consist
data.
The data cleaning is used to decrease noise and
handle missing values. There are a number of methods for
handling records that contain missing values such as
omitting the incorrect fields(s) or entire record that contains
the incorrect field(s), automatically entering or correcting
the data with default values, deriving a model to enter or
correct the data, replacing all values with a global constant
and using the imputation method to predict missing values.
Fig. 1: steps for crime hotspot prediction [2][3].
Data reduction is necessary to remove irrelevant
attributes from dataset. For example according to Almanie,
Mirza and Lor [4], the authors performed data reduction in
terms of number of instances. They observed Denver crimes
dataset contained a set of traffic accident instances. The
attribute “Is_Crime” suggests whether the instance belongs
to a crime or accident. The author used the attribute
A Survey on Data Mining Techniques for Crime Hotspots Prediction
(IJSRD/Vol. 3/Issue 10/2015/258)
All rights reserved by www.ijsrd.com 1113
“Is_Crime” to filter the instances and remove all the
irrelevant ones [4].
Data integration steps is used for different
purposes. In Almanie, Mirza and Lor [4] the authors used to
avoid different attribute naming, they unified the key
attribute names for both crime datasets as follow:
Crime_Type, Crime_Date and Crime_Location.
Crime_Location represents the neighborhood attribute for
Denver dataset whereas the Area attribute for Los Angeles
dataset.
The data transformation is used to reduce the
diversity of attribute values by mapping their values to fall
within smaller group. For example burglary and robbery
crimes are included in theft crime type [4].
C. Feature Selection
Feature selection is a part of data preprocessing. Feature
selection is used to remove the irrelevant or redundant
attributes. Feature selection has several objectives such as
enhancing model performance by avoiding over fitting in
the case of supervised classification [1]. The main attributes
like crime type, location, crime time in feature selection
process.
D. Classification
After preprocessing and feature selection phases, the
numbers of attribute was meaningfully extract and are now
more precise for building the data mining models.
Classification as a famous data mining supervised learning
techniques are used to extract meaningful information from
large datasets and can be adequately used to predict
unknown classes. There are various classification
algorithms, such as Support Vector Machines (SVM), k-
Nearest Neighbor (k-NN), Decision Tree, Weighted Voting
and Artificial Neural Networks. All these techniques can be
applied to a dataset for discovering sets of models to
forecast unknown class labels [1][3].
E. Prediction
In order to quantitatively predict the crime status, many data
mining methods can be used. In this study, a classification
task is applied for prediction.
F. Visualization
The crime prone areas can be graphically represented using
a heat map which indicates level of activity, generally
darker colors to represents low activity and brighter colors
to represents high activity [1][2][4].
The steps for crime hotspots prediction are
introduced. Now the survey of different data mining
techniques for prediction are presented. The objective of
prediction is to forecast the value of an attribute based on
value of other attributes. In the prediction techniques first a
model is created based on data distribution and then that
model is used to predict future on unknown value. The basic
data mining techniques are introduced which are used to
predict crime hotspots.
1) Support Vector Machine
The support vector machines are supervised learning models
with associated learning algorithms that analyze the data and
recognize the patterns that is used for classification and
regression analysis. A support vector machine construct a
hyperplane or set of hyperplanes in a high or infinite-
dimensional space, which can be used for classification,
regression and other tasks.
The support vector machine technique is used for support
vector machine [3]. The following approach is: The data
used for this research was taken from a variety of city
agencies. Each data contains the type of event, the location
with longitude and latitude, and the time and date of the
incident. This data was classified from different
classification techniques. The area which have the crime rate
above the predefined rate are positive or members of hotspot
class and area with crime rate below the predefined rate are
negative or non-members of hotspot class. This labelled data
set used as the training set in SVM classification. The
technique gives good results in most cases but it is
computationally expensive so it runs slow.
2) Naïve Bayesian
Naïve Bayes classifier is a supervised learning algorithm. It
is effective and widely used. It is a statistical model that
predicts class membership probabilities based on Bayes’
theorem [5]. The naïve bayes classifier model is fast to
build. It can be modified with new training data without
having rebuild the model. This classifier is very simple to
construct and it may be easily apply to huge data sets [1].
Predictive accuracy is generally of this classifier in most
cases. It consider each attributes separately when classify
new instance. It based on the Bayes rule of conditional
probability [6][5].
P (H|X) = P (X|H) P (H)/ P (X)
Where,
1) P (A) is the prior probability of A. It is "prior" in the
sense that it does not take into account any information
about B.
2) P (A|B) is the conditional probability of A, given B. It is
also called the posterior probability because it is derived
from or depends on the specified value of B.
3) P (B|A) is the conditional probability of B given A. It is
also called the likelihood.
3) Decision Tree
Decision tree is a flow chart like structure in which each
internal node represents a ‘test’ on the attribute, each branch
represent outcome of the test and each leaf node represents a
class label. It is simple to understand and to interpret. It is
able to handle both numerical and categorical data. It is also
able to handle multi-output problems. It performs well even
if its assumptions are somewhat violated by the true model
from which the data was generated. For crime hotspot
prediction generally J48 algorithm is used [2][3][4].
The decision tree can be unstable because small
variations in data might results in completely different tree
being generated. Another disadvantage is its complexity [6].
4) Artificial Neural Network
In 1943, McCulloch and Pitts, gives the first model of
artificial neuron. According to Nigrin, A neural network is a
circuit composed of a very large numbers of processing
elements that know as Neuron. Each element works only on
local information. Furthermore each element operates non-
parellerliy, thus there is no system clock [7]. The neural
network have high strength when modeling a complex
system. It gives higher accuracy when increasing the data
but sudden changes in new data might give low results [4].
It takes long running time [6].
A Survey on Data Mining Techniques for Crime Hotspots Prediction
(IJSRD/Vol. 3/Issue 10/2015/258)
All rights reserved by www.ijsrd.com 1114
5) K- Nearest Neighbor
The k-Nearest Neighbors algorithm (or k-NN for short) is a
non-parametric method used for classification and
regression [8]. In this technique the classification is done by
comparing feature vectors of the different points. Nearest
neighbor classifier makes their predictions based on local
information, whereas decision tree and rule based classifiers
attempts to find a global model that fits the entire input
space. Because the classification decision are made locally.
This classifier can produce wrong predictions unless the
approximate proximity measure and data preprocessing
steps are taken [6].
In the next section, observation of survey are presented.
III. COMPARATIVE ANALYSIS
Parameters
Support vector
machine
Decision tree Naïve Bayes
Summary
Supervised
learning model.
Analyze data
recognize
patterns [9].
Flowchart like structure in which each
internal node represents a “test” on
attribute, branch as an outcome of taste
and leaf as a class label [1].
Supervised learning algorithm. It is a
statistical model that predicts class
membership probabilities based on
Bayes' theorem [4].
Accuracy
Crime dataset
US(Northeast)[4] 70% to 80%[4] 60% to 70%[4] 70% to 80%[4]
Crime data set of
Denver[4] _ 42%[4] 51%[4]
Crime data set of
Los Angeles[4]
43%[4] 54%[4]
F1-measure
Crime dataset
US(Northeast)[4] 70% to 80%[4] 70% to 80%[4] Above 80%[4]
Table 1. Comparative analysis for data mining techniques.
Table 1 shows the comparative analysis of different data
mining techniques. All techniques are compared with its
prediction accuracy and f1-measure. The support vector
machine provide good accuracy but it is computationally
expensive thus runs slow. The complexity of decision tree is
high. Small variations in data might results in completely
different tree being generated. The Naïve bays is highly
scalable and easy to implement. Good results obtained in
most cases. The results are also depends on which type of
data is given. Here the naïve bays gives high accuracy and
f1-score for different crime database compared to support
vector machine and decision tree.
IV. CONCLUSION
The data mining techniques for predicting crime hotspots
are discuss in this paper. This techniques are capable to
enhance the prediction accuracy, performance and speed.
After analyzing different results from various paper we can
conclude that Naïve Bayes gives efficient results for crime
hotspot prediction.
V. AKNOWLWDGEMENT
I take the opportunity to express my gratitude and regards to
my guide Prof.Shivani Vora, CE & IT Dept., CGPIT,
Bardoli for her suggestions and encouragements.
REFERENCES
[1] Somayeh Shojaee,Aida Mustapha, Fatimah Sidi,
Marzanah A.Jabar, “A study on Classification Learning
algorithms to predict crime status”, International
Journal of Digital Content Technology and its
Applications, vol 7, 2013
[2] Shiju Sathyadevan, Devan M.S and Surya
Gangadharan. S,” Crime Analysis and Prediction Using
Data Mining”, First International Conference on
Networks & Soft Computing,IEEE
[3] Chung-Hsien Yu, Max W. Ward, Melissa Morabito and
Wei Ding(2011),”Crime Forecasting Using Data
Mining Techniques”, Department of Computer
Science, 2Department of Sociology, University of
Massachusetts Boston, 100 Morrissey Blvd., Boston,
MA 02125
[4] Tahani Almanie, Rsha Mirza and Elizabeth Lor,” Crime
Prediction based on Crime Types and using Spitial and
Temporal Criminal Hotspots”, International Journal of
Data Mining & Knowledge Management Process
(IJDKP) Vol.5, No.4, July 2015
[5] https://en.wikipedia.org/wiki/Bayes%27_theorem
[6] Michael Steinbach, Vipin Kumar, Pang-Ning Tan,
“Classification: Alternative Techniques” in Introduction
to Data Mining 3rd edition, 2006
[7] Nikhil Dubey, Setu Kumar Chaturvedi,ph.d, “A Survey
paper on Crime Prediction Technique using Data
Mining”, International journal of Engineering research
and Applications, vol 1, 2014
[8] https://en.wikipedia.org/wiki/K-
nearest_neighbors_algorithm
[9] Keivan Kianmehr and Reda Alhajj, “Crime Hot-Spots
Prediction Using Support Vector Machine” pp. 952-
960 IEEE 2006.
[10]https://en.wikipedia.org/wiki/Support_vector_machine.

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Survey of Data Mining Techniques on Crime Data Analysis
Survey of Data Mining Techniques on Crime Data AnalysisSurvey of Data Mining Techniques on Crime Data Analysis
Survey of Data Mining Techniques on Crime Data Analysis
 
Fundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis ConceptsFundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis Concepts
 
ISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASET
ISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASETISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASET
ISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASET
 
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGPREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
 
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGPREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache Spark
 
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
 
Hybrid Approach for Intrusion Detection Model Using Combination of K-Means Cl...
Hybrid Approach for Intrusion Detection Model Using Combination of K-Means Cl...Hybrid Approach for Intrusion Detection Model Using Combination of K-Means Cl...
Hybrid Approach for Intrusion Detection Model Using Combination of K-Means Cl...
 
Statistical Prediction for analyzing Epidemiological Characteristics of COVID...
Statistical Prediction for analyzing Epidemiological Characteristics of COVID...Statistical Prediction for analyzing Epidemiological Characteristics of COVID...
Statistical Prediction for analyzing Epidemiological Characteristics of COVID...
 
I Dunderstn
I DunderstnI Dunderstn
I Dunderstn
 
Comparison of Text Classifiers on News Articles
Comparison of Text Classifiers on News ArticlesComparison of Text Classifiers on News Articles
Comparison of Text Classifiers on News Articles
 
my IEEE
my IEEEmy IEEE
my IEEE
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...
 
Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...
 
Credit card payment_fraud_detection
Credit card payment_fraud_detectionCredit card payment_fraud_detection
Credit card payment_fraud_detection
 
An intrusion detection model based on fuzzy membership function using gnp
An intrusion detection model based on fuzzy membership function using gnpAn intrusion detection model based on fuzzy membership function using gnp
An intrusion detection model based on fuzzy membership function using gnp
 
IRJET- Competitive Analysis of Attacks on Social Media
IRJET-  	 Competitive Analysis of Attacks on Social MediaIRJET-  	 Competitive Analysis of Attacks on Social Media
IRJET- Competitive Analysis of Attacks on Social Media
 
Simulation
SimulationSimulation
Simulation
 
Machine learning in computer security
Machine learning in computer securityMachine learning in computer security
Machine learning in computer security
 
A Review of Various Clustering Techniques
A Review of Various Clustering TechniquesA Review of Various Clustering Techniques
A Review of Various Clustering Techniques
 

Similar a A Survey on Data Mining Techniques for Crime Hotspots Prediction

News document analysis by using a proficient algorithm
News document analysis by using a proficient algorithmNews document analysis by using a proficient algorithm
News document analysis by using a proficient algorithm
IJERA Editor
 

Similar a A Survey on Data Mining Techniques for Crime Hotspots Prediction (20)

Bs4301396400
Bs4301396400Bs4301396400
Bs4301396400
 
A predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analyticsA predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analytics
 
Survey of Data Mining Techniques on Crime Data Analysis
Survey of Data Mining Techniques on Crime Data AnalysisSurvey of Data Mining Techniques on Crime Data Analysis
Survey of Data Mining Techniques on Crime Data Analysis
 
ACCESS.2020.3028420.pdf
ACCESS.2020.3028420.pdfACCESS.2020.3028420.pdf
ACCESS.2020.3028420.pdf
 
V1_I2_2012_Paper6.doc
V1_I2_2012_Paper6.docV1_I2_2012_Paper6.doc
V1_I2_2012_Paper6.doc
 
IRJET- Crime Analysis using Data Mining and Data Analytics
IRJET- Crime Analysis using Data Mining and Data AnalyticsIRJET- Crime Analysis using Data Mining and Data Analytics
IRJET- Crime Analysis using Data Mining and Data Analytics
 
Predictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RatePredictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime Rate
 
Predictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RatePredictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime Rate
 
4Data Mining Approach of Accident Occurrences Identification with Effective M...
4Data Mining Approach of Accident Occurrences Identification with Effective M...4Data Mining Approach of Accident Occurrences Identification with Effective M...
4Data Mining Approach of Accident Occurrences Identification with Effective M...
 
Crime Prediction and Analysis
Crime Prediction and AnalysisCrime Prediction and Analysis
Crime Prediction and Analysis
 
Analysis of Crime Big Data using MapReduce
Analysis of Crime Big Data using MapReduceAnalysis of Crime Big Data using MapReduce
Analysis of Crime Big Data using MapReduce
 
Propose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysisPropose Data Mining AR-GA Model to Advance Crime analysis
Propose Data Mining AR-GA Model to Advance Crime analysis
 
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...
SUPERVISED AND UNSUPERVISED MACHINE LEARNING METHODOLOGIES FOR CRIME PATTERN ...
 
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...
Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern ...
 
News document analysis by using a proficient algorithm
News document analysis by using a proficient algorithmNews document analysis by using a proficient algorithm
News document analysis by using a proficient algorithm
 
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNING
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNINGCRIME ANALYSIS AND PREDICTION USING MACHINE LEARNING
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNING
 
Life and science journal.pdf
Life and science journal.pdfLife and science journal.pdf
Life and science journal.pdf
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOT
 
Ijarcce 6
Ijarcce 6Ijarcce 6
Ijarcce 6
 
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITY
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITYSENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITY
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITY
 

Más de IJSRD

Más de IJSRD (20)

#IJSRD #Research Paper Publication
#IJSRD #Research Paper Publication#IJSRD #Research Paper Publication
#IJSRD #Research Paper Publication
 
Maintaining Data Confidentiality in Association Rule Mining in Distributed En...
Maintaining Data Confidentiality in Association Rule Mining in Distributed En...Maintaining Data Confidentiality in Association Rule Mining in Distributed En...
Maintaining Data Confidentiality in Association Rule Mining in Distributed En...
 
Performance and Emission characteristics of a Single Cylinder Four Stroke Die...
Performance and Emission characteristics of a Single Cylinder Four Stroke Die...Performance and Emission characteristics of a Single Cylinder Four Stroke Die...
Performance and Emission characteristics of a Single Cylinder Four Stroke Die...
 
Preclusion of High and Low Pressure In Boiler by Using LABVIEW
Preclusion of High and Low Pressure In Boiler by Using LABVIEWPreclusion of High and Low Pressure In Boiler by Using LABVIEW
Preclusion of High and Low Pressure In Boiler by Using LABVIEW
 
Prevention and Detection of Man in the Middle Attack on AODV Protocol
Prevention and Detection of Man in the Middle Attack on AODV ProtocolPrevention and Detection of Man in the Middle Attack on AODV Protocol
Prevention and Detection of Man in the Middle Attack on AODV Protocol
 
Comparative Analysis of PAPR Reduction Techniques in OFDM Using Precoding Tec...
Comparative Analysis of PAPR Reduction Techniques in OFDM Using Precoding Tec...Comparative Analysis of PAPR Reduction Techniques in OFDM Using Precoding Tec...
Comparative Analysis of PAPR Reduction Techniques in OFDM Using Precoding Tec...
 
Evaluation the Effect of Machining Parameters on MRR of Mild Steel
Evaluation the Effect of Machining Parameters on MRR of Mild SteelEvaluation the Effect of Machining Parameters on MRR of Mild Steel
Evaluation the Effect of Machining Parameters on MRR of Mild Steel
 
Filter unwanted messages from walls and blocking nonlegitimate user in osn
Filter unwanted messages from walls and blocking nonlegitimate user in osnFilter unwanted messages from walls and blocking nonlegitimate user in osn
Filter unwanted messages from walls and blocking nonlegitimate user in osn
 
Keystroke Dynamics Authentication with Project Management System
Keystroke Dynamics Authentication with Project Management SystemKeystroke Dynamics Authentication with Project Management System
Keystroke Dynamics Authentication with Project Management System
 
Diagnosing lungs cancer Using Neural Networks
Diagnosing lungs cancer Using Neural NetworksDiagnosing lungs cancer Using Neural Networks
Diagnosing lungs cancer Using Neural Networks
 
A Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningA Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion Mining
 
A Defect Prediction Model for Software Product based on ANFIS
A Defect Prediction Model for Software Product based on ANFISA Defect Prediction Model for Software Product based on ANFIS
A Defect Prediction Model for Software Product based on ANFIS
 
Experimental Investigation of Granulated Blast Furnace Slag ond Quarry Dust a...
Experimental Investigation of Granulated Blast Furnace Slag ond Quarry Dust a...Experimental Investigation of Granulated Blast Furnace Slag ond Quarry Dust a...
Experimental Investigation of Granulated Blast Furnace Slag ond Quarry Dust a...
 
Product Quality Analysis based on online Reviews
Product Quality Analysis based on online ReviewsProduct Quality Analysis based on online Reviews
Product Quality Analysis based on online Reviews
 
Solving Fuzzy Matrix Games Defuzzificated by Trapezoidal Parabolic Fuzzy Numbers
Solving Fuzzy Matrix Games Defuzzificated by Trapezoidal Parabolic Fuzzy NumbersSolving Fuzzy Matrix Games Defuzzificated by Trapezoidal Parabolic Fuzzy Numbers
Solving Fuzzy Matrix Games Defuzzificated by Trapezoidal Parabolic Fuzzy Numbers
 
Study of Clustering of Data Base in Education Sector Using Data Mining
Study of Clustering of Data Base in Education Sector Using Data MiningStudy of Clustering of Data Base in Education Sector Using Data Mining
Study of Clustering of Data Base in Education Sector Using Data Mining
 
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
 
Investigation of Effect of Process Parameters on Maximum Temperature during F...
Investigation of Effect of Process Parameters on Maximum Temperature during F...Investigation of Effect of Process Parameters on Maximum Temperature during F...
Investigation of Effect of Process Parameters on Maximum Temperature during F...
 
Review Paper on Computer Aided Design & Analysis of Rotor Shaft of a Rotavator
Review Paper on Computer Aided Design & Analysis of Rotor Shaft of a RotavatorReview Paper on Computer Aided Design & Analysis of Rotor Shaft of a Rotavator
Review Paper on Computer Aided Design & Analysis of Rotor Shaft of a Rotavator
 
Studies on Physico - Mechanical Properties of Chloroprene Rubber Vulcanizate ...
Studies on Physico - Mechanical Properties of Chloroprene Rubber Vulcanizate ...Studies on Physico - Mechanical Properties of Chloroprene Rubber Vulcanizate ...
Studies on Physico - Mechanical Properties of Chloroprene Rubber Vulcanizate ...
 

Último

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Último (20)

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 

A Survey on Data Mining Techniques for Crime Hotspots Prediction

  • 1. IJSRD - International Journal for Scientific Research & Development| Vol. 3, Issue 10, 2015 | ISSN (online): 2321-0613 All rights reserved by www.ijsrd.com 1112 A Survey on Data Mining Techniques for Crime Hotspots Prediction Neha Patel1 Prof. Shivani V. Vora2 1 P.G. Student 2 Assistant Professor 1,2 Department of Computer Engineering 1,2 CGPIT, UkaTarsadiya University, Mahuva, Surat, Gujarat, India. Abstract— A crime is an act which is against the laws of a country or region. The technique which is used to find areas on a map which have high crime intensity is known as crime hotspot prediction. The technique uses the crime data which includes the area with crime rate and predict the future location with high crime intensity. The motivation of crime hotspot prediction is to raise people’s awareness regarding the dangerous location in certain time period. It can help for police resource allocation for creating a safe environment. The paper presents survey of different types of data mining techniques for crime hotspots prediction. Key words: Data mining; crime hotspot; prediction; accuracy I. INTRODUCTION Crime is a social problem that affecting the people’s life and economic development of a society. The prediction of crime is difficult [1]. The occurrence of crime is related to a variety of socio-economic and crime opportunity factors, like population, economic investment and arrest rate. Crime usually has spatial and temporal characteristics related to the population, environment, economic factors, politics, and social events. The victims of crime may not be predicted but the place that has probability of an occurrence of crime it may be predicted. Analysis of large amount of crime data is a difficult task. Data mining is a useful process to handle amount of data and to reduce the manpower. The most effective spatial-temporal analysis for the understanding of the contained relationships among crime events is the analysis of crime hot spots. The analysis of crime hot spots contributes to the conflict and prevention of crimes by allowing the planning of strategies that optimize the distribution of police resources. As police resources are limited in some areas, the planning of such allocation becomes an important task. Two type of crime hotspots are there: crime general and crime specific. Wide range of crime types are occur in a particular area then is known as crime general hotspot. One or several types of crimes are occur in a particular area then it is called as a crime specific hotspot. The crime hotspots prediction is done using crime location, crime time and which type of crime is done. We can also find the area for specific crime. The main motivation of crime hotspots prediction is to create a safe environment and for police resource allocation. In the next section-II the general steps for crime hotspots prediction and classification techniques are introduced. Where five classification techniques are described. Section-III represents the comparative analysis of three classification techniques which are support vector machine, decision tree and naïve bays. II. LITERATURE SURVEY The general steps for crime hotspot prediction are: 1)Data collection 2)Preprocessing 3)Feature selection 4)Classification 5)Prediction 6)Visualization A. Data Collection In data collection step crime data is collected from different sources like news sites, blogs, RSS feed and mainly from police records. For unstructured data Mongo database is used [2]. B. Preprocessing There are few techniques for data preprocessing. This techniques are data cleaning, reduction, integration, discretization, transformation and feature selection. It intends to reduce some noises, incomplete and in consist data. The data cleaning is used to decrease noise and handle missing values. There are a number of methods for handling records that contain missing values such as omitting the incorrect fields(s) or entire record that contains the incorrect field(s), automatically entering or correcting the data with default values, deriving a model to enter or correct the data, replacing all values with a global constant and using the imputation method to predict missing values. Fig. 1: steps for crime hotspot prediction [2][3]. Data reduction is necessary to remove irrelevant attributes from dataset. For example according to Almanie, Mirza and Lor [4], the authors performed data reduction in terms of number of instances. They observed Denver crimes dataset contained a set of traffic accident instances. The attribute “Is_Crime” suggests whether the instance belongs to a crime or accident. The author used the attribute
  • 2. A Survey on Data Mining Techniques for Crime Hotspots Prediction (IJSRD/Vol. 3/Issue 10/2015/258) All rights reserved by www.ijsrd.com 1113 “Is_Crime” to filter the instances and remove all the irrelevant ones [4]. Data integration steps is used for different purposes. In Almanie, Mirza and Lor [4] the authors used to avoid different attribute naming, they unified the key attribute names for both crime datasets as follow: Crime_Type, Crime_Date and Crime_Location. Crime_Location represents the neighborhood attribute for Denver dataset whereas the Area attribute for Los Angeles dataset. The data transformation is used to reduce the diversity of attribute values by mapping their values to fall within smaller group. For example burglary and robbery crimes are included in theft crime type [4]. C. Feature Selection Feature selection is a part of data preprocessing. Feature selection is used to remove the irrelevant or redundant attributes. Feature selection has several objectives such as enhancing model performance by avoiding over fitting in the case of supervised classification [1]. The main attributes like crime type, location, crime time in feature selection process. D. Classification After preprocessing and feature selection phases, the numbers of attribute was meaningfully extract and are now more precise for building the data mining models. Classification as a famous data mining supervised learning techniques are used to extract meaningful information from large datasets and can be adequately used to predict unknown classes. There are various classification algorithms, such as Support Vector Machines (SVM), k- Nearest Neighbor (k-NN), Decision Tree, Weighted Voting and Artificial Neural Networks. All these techniques can be applied to a dataset for discovering sets of models to forecast unknown class labels [1][3]. E. Prediction In order to quantitatively predict the crime status, many data mining methods can be used. In this study, a classification task is applied for prediction. F. Visualization The crime prone areas can be graphically represented using a heat map which indicates level of activity, generally darker colors to represents low activity and brighter colors to represents high activity [1][2][4]. The steps for crime hotspots prediction are introduced. Now the survey of different data mining techniques for prediction are presented. The objective of prediction is to forecast the value of an attribute based on value of other attributes. In the prediction techniques first a model is created based on data distribution and then that model is used to predict future on unknown value. The basic data mining techniques are introduced which are used to predict crime hotspots. 1) Support Vector Machine The support vector machines are supervised learning models with associated learning algorithms that analyze the data and recognize the patterns that is used for classification and regression analysis. A support vector machine construct a hyperplane or set of hyperplanes in a high or infinite- dimensional space, which can be used for classification, regression and other tasks. The support vector machine technique is used for support vector machine [3]. The following approach is: The data used for this research was taken from a variety of city agencies. Each data contains the type of event, the location with longitude and latitude, and the time and date of the incident. This data was classified from different classification techniques. The area which have the crime rate above the predefined rate are positive or members of hotspot class and area with crime rate below the predefined rate are negative or non-members of hotspot class. This labelled data set used as the training set in SVM classification. The technique gives good results in most cases but it is computationally expensive so it runs slow. 2) Naïve Bayesian Naïve Bayes classifier is a supervised learning algorithm. It is effective and widely used. It is a statistical model that predicts class membership probabilities based on Bayes’ theorem [5]. The naïve bayes classifier model is fast to build. It can be modified with new training data without having rebuild the model. This classifier is very simple to construct and it may be easily apply to huge data sets [1]. Predictive accuracy is generally of this classifier in most cases. It consider each attributes separately when classify new instance. It based on the Bayes rule of conditional probability [6][5]. P (H|X) = P (X|H) P (H)/ P (X) Where, 1) P (A) is the prior probability of A. It is "prior" in the sense that it does not take into account any information about B. 2) P (A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is derived from or depends on the specified value of B. 3) P (B|A) is the conditional probability of B given A. It is also called the likelihood. 3) Decision Tree Decision tree is a flow chart like structure in which each internal node represents a ‘test’ on the attribute, each branch represent outcome of the test and each leaf node represents a class label. It is simple to understand and to interpret. It is able to handle both numerical and categorical data. It is also able to handle multi-output problems. It performs well even if its assumptions are somewhat violated by the true model from which the data was generated. For crime hotspot prediction generally J48 algorithm is used [2][3][4]. The decision tree can be unstable because small variations in data might results in completely different tree being generated. Another disadvantage is its complexity [6]. 4) Artificial Neural Network In 1943, McCulloch and Pitts, gives the first model of artificial neuron. According to Nigrin, A neural network is a circuit composed of a very large numbers of processing elements that know as Neuron. Each element works only on local information. Furthermore each element operates non- parellerliy, thus there is no system clock [7]. The neural network have high strength when modeling a complex system. It gives higher accuracy when increasing the data but sudden changes in new data might give low results [4]. It takes long running time [6].
  • 3. A Survey on Data Mining Techniques for Crime Hotspots Prediction (IJSRD/Vol. 3/Issue 10/2015/258) All rights reserved by www.ijsrd.com 1114 5) K- Nearest Neighbor The k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression [8]. In this technique the classification is done by comparing feature vectors of the different points. Nearest neighbor classifier makes their predictions based on local information, whereas decision tree and rule based classifiers attempts to find a global model that fits the entire input space. Because the classification decision are made locally. This classifier can produce wrong predictions unless the approximate proximity measure and data preprocessing steps are taken [6]. In the next section, observation of survey are presented. III. COMPARATIVE ANALYSIS Parameters Support vector machine Decision tree Naïve Bayes Summary Supervised learning model. Analyze data recognize patterns [9]. Flowchart like structure in which each internal node represents a “test” on attribute, branch as an outcome of taste and leaf as a class label [1]. Supervised learning algorithm. It is a statistical model that predicts class membership probabilities based on Bayes' theorem [4]. Accuracy Crime dataset US(Northeast)[4] 70% to 80%[4] 60% to 70%[4] 70% to 80%[4] Crime data set of Denver[4] _ 42%[4] 51%[4] Crime data set of Los Angeles[4] 43%[4] 54%[4] F1-measure Crime dataset US(Northeast)[4] 70% to 80%[4] 70% to 80%[4] Above 80%[4] Table 1. Comparative analysis for data mining techniques. Table 1 shows the comparative analysis of different data mining techniques. All techniques are compared with its prediction accuracy and f1-measure. The support vector machine provide good accuracy but it is computationally expensive thus runs slow. The complexity of decision tree is high. Small variations in data might results in completely different tree being generated. The Naïve bays is highly scalable and easy to implement. Good results obtained in most cases. The results are also depends on which type of data is given. Here the naïve bays gives high accuracy and f1-score for different crime database compared to support vector machine and decision tree. IV. CONCLUSION The data mining techniques for predicting crime hotspots are discuss in this paper. This techniques are capable to enhance the prediction accuracy, performance and speed. After analyzing different results from various paper we can conclude that Naïve Bayes gives efficient results for crime hotspot prediction. V. AKNOWLWDGEMENT I take the opportunity to express my gratitude and regards to my guide Prof.Shivani Vora, CE & IT Dept., CGPIT, Bardoli for her suggestions and encouragements. REFERENCES [1] Somayeh Shojaee,Aida Mustapha, Fatimah Sidi, Marzanah A.Jabar, “A study on Classification Learning algorithms to predict crime status”, International Journal of Digital Content Technology and its Applications, vol 7, 2013 [2] Shiju Sathyadevan, Devan M.S and Surya Gangadharan. S,” Crime Analysis and Prediction Using Data Mining”, First International Conference on Networks & Soft Computing,IEEE [3] Chung-Hsien Yu, Max W. Ward, Melissa Morabito and Wei Ding(2011),”Crime Forecasting Using Data Mining Techniques”, Department of Computer Science, 2Department of Sociology, University of Massachusetts Boston, 100 Morrissey Blvd., Boston, MA 02125 [4] Tahani Almanie, Rsha Mirza and Elizabeth Lor,” Crime Prediction based on Crime Types and using Spitial and Temporal Criminal Hotspots”, International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.4, July 2015 [5] https://en.wikipedia.org/wiki/Bayes%27_theorem [6] Michael Steinbach, Vipin Kumar, Pang-Ning Tan, “Classification: Alternative Techniques” in Introduction to Data Mining 3rd edition, 2006 [7] Nikhil Dubey, Setu Kumar Chaturvedi,ph.d, “A Survey paper on Crime Prediction Technique using Data Mining”, International journal of Engineering research and Applications, vol 1, 2014 [8] https://en.wikipedia.org/wiki/K- nearest_neighbors_algorithm [9] Keivan Kianmehr and Reda Alhajj, “Crime Hot-Spots Prediction Using Support Vector Machine” pp. 952- 960 IEEE 2006. [10]https://en.wikipedia.org/wiki/Support_vector_machine.