SlideShare una empresa de Scribd logo
1 de 7
Descargar para leer sin conexión
Network and Complex Systems
ISSN 2224-610X (Paper) ISSN 2225-0603 (Online)
Vol.3, No.7, 2013

www.iiste.org

A New Approach of Detecting Network Anomalies using
Improved ID3 with Horizontal Partioning Based Decision Tree
Sonika Tiwari*, Prof. Roopali Soni**
*Department of Computer Science, OIST Bhopal
Email: sonikatiwari_mt@yahoo.com
**Department of Computer Science, OIST Bhopal
Email:roopalisoni@oriental.ac.in
Abstract
In this paper we are proposing a new approach of Detecting Network Anomalies using improved ID3 with
horizontal portioning based decision tree. Here we first apply different clustering algorithms and after that we
apply horizontal partioning decision tree and then check the network anomalies from the decision tree. Here in
this paper we find the comparative analysis of different clustering algorithms and existing id3 based decision tree.

I.

INTRODUCTION
It is important to keep our computer systems secure because economical activities rely on it. Despite the
existence of attack prevention mechanisms such as firewalls, most company computer networks are still the
victim of attacks. According to the statistics of CERT [1], the number of reported incidents against computer
networks has increased from 252 in 1990 to 21756 in 2000 and to 137529 in 2003. This happened because of
misconfiguration of firewalls or because malicious activities are generally cleverly designed to circumvent the
firewall policies. It is therefore crucial to have another line of defence in order to detect and stop malicious
activities. This line of defence is intrusion detection systems (IDS).
During the last decades, different approaches to intrusion detection have been explored. The two most common
approaches are misuse detection and anomaly detection. In misuse detection, attacks are detected by matching
the current traffic pattern with the signature of known attacks. Anomaly detection keeps a profile of normal
system behavior and interprets any significant deviation from this normal profile as malicious activity. One of
the strengths of anomaly detection is the ability to detect new attacks. Anomaly detection’s most serious
weakness is that it generates too many false alarms. Anomaly detection falls into two categories: supervised
anomaly detection and unsupervised anomaly detection. In supervised anomaly detection, the instances of the
data set used for training the system are labelled either as normal or as specific attack type. The problem with
this approach is that labeling the data is time consuming. Unsupervised anomaly detection, on the other hand,
operates on unlabeled data. The advantage of using unlabeled data is that the unlabeled data is easy and
inexpensive to obtain. The main challenge in performing unsupervised anomaly detection is distinguishing the
normal data patterns from attack data patterns.
Recently, clustering has been investigated as one approach to solving this problem. As attack data patterns are
assumed to differ from normal data patterns, clustering can be used to distinguish attack data patterns from
normal data patterns. Clustering network traffic data is difficult because:
1. of high data volume
2. of high data dimension
3. the distribution of attack and normal classes is skewed
4. the data is a mixture of categorical and continuous data
5. of the pre-processing of the data required.
Network anomaly detection
As we explained earlier, detectors need models or rules for detecting intrusions. These models can be built offline on the basis of earlier network traffic data gathered by agents. Once the model has been built, the task of
detecting and stopping intrusions can be performed online. One of the weaknesses of this approach is that it is
not adaptive. This is because small changes in traffic affect the model globally. Some approaches to anomaly
detection perform the model construction and anomaly detection simultaneously on-line. In some of these
approaches clustering has been used. One of the advantages of online modeling is that it is less time consuming
because it does not require a separate training phase. Furthermore, the model reflects the current nature of
network traffic. The problem with this approach is that it can lead to inaccurate models. This happens because
this approach fails to detect attacks performed systematically over a long period of time. These types of attacks
can only be detected by analyzing network traffic gathered over a long period of time. The clusters obtained by
clustering network traffic data off-line can be used for either anomaly detection or misuse detection. For
anomaly detection, it is the clusters formed by the normal data that are relevant for model construction. For
misuse detection, it is the different attack clusters that are used for model construction.

1
Network and Complex Systems
ISSN 2224-610X (Paper) ISSN 2225-0603 (Online)
Vol.3, No.7, 2013

www.iiste.org

Clustering is a division of data into groups of similar objects. Each group, called cluster, consists of objects that
are similar amongst them and dissimilar compared to objects of other groups. Representing data by fewer clusters
necessarily loses certain fine details, but achieves simplification. It represents many data objects by few clusters,
and hence, it models data by its clusters.
Cluster analysis is the organization of a collection of patterns (usually represented as a vector of measurements, or
a point in a multidimensional space) into clusters based on similarity. Patterns within a valid cluster are more
similar to each other than they are to a pattern belonging to a different cluster. It is important to understand the
difference between clustering (unsupervised classification) and discriminate analysis (supervised
classification). In supervised classification, we are provided with a collection of labelled (reclassified) patterns;
the problem is to label a newly encountered, yet unlabeled, pattern. Typically, the given labeled (training)
patterns are used to learn the descriptions of classes which in turn are used to label a new pattern. In the case of
clustering, the problem is to group a given collection of unlabeled patterns into meaningful clusters. In a sense,
labels are associated with clusters also, but these category labels are data driven; that is, they are
obtained solely from the data [2,3,4].
ID3 Algorithm
The ID3 algorithm (Inducing Decision Trees) was originally introduced by Quinlan in [11] and is described
below in Algorithm 1. Here we briefly recall the steps involved in the algorithm. For a thorough discussion of
the algorithm we refer the interested reader to [10].
Require: R, a set of attributes.
Require: C, the class attribute.
Require: S, data set of tuples.
1: if R is empty then
2: Return the leaf having the most frequent value in data set S.
3: else if all tuples in S have the same class value then
4: Return a leaf with that specific class value.
5: else
6: Determine attribute A with the highest information gain in S.
7: Partition S in m parts S(a1), ..., S(am) such that a1, ..., am are the different values of A.
8: Return a tree with root A and m branches labeled a1...am, such that branch i contains ID3(R − {A} ,C, S(ai)).
9: end if

II.
RELATED WORK
In [5] Network anomaly detection aims at detecting malicious activities in computer network traffic data. In this
approach, the normal profile of the network traffic is modeled and any significant deviation from this normal
profile is interpreted as malicious. While supervised anomaly detection models the normal traffic behavior on the
basis of an attack free data set, unsupervised anomaly detection works on a data set which contains both normal
and attack data. Clustering has recently been investigated as one way of approaching the issues of unsupervised
anomaly detection.
Our contribution: The main goal of the paper has been to investigate the efficiency of different classical
clustering algorithms in clustering network traffic data for unsupervised anomaly detection. The clusters
obtained by clustering the network traffic data set are intended to be used by a security expert for manual
labeling. A second goal has been to study some possible ways of combining these algorithms in order to improve
their performance.
In [6] Clustering is a division of data into group s of similar objects. Each group, called a cluster, consists of
objects that are similar between themselves and dissimilar compared to objects of other groups. This
paper is intended to study and compare different data clustering algorithms. The algorithms under
investigation are: k-means algorithm, hierarchical clustering algorithm, self-organizing maps algorithm, and
expectation maximization clustering algorithm. All these algorithms are compared according to the following
factors: size of dataset, number of clusters, type of dataset and type of software used.
Our contribution: The main conclusion that can be concluded is the performance comparison of different
clustering algorithm.
In [7] This consider privacy preserving decision tree induction via ID3 in the case where the training data is
horizontally or vertically distributed. Furthermore, we consider the same problem in the case where the data is
both horizontally and vertically distributed, a situation we refer to as grid partitioned data. We give an algorithm
for privacy preserving ID3 over horizontally partitioned data involving more than two parties. For grid
partitioned data, we discuss two different evaluation methods for preserving privacy ID3, namely, first merging
horizontally and developing vertically or first merging vertically and next developing horizontally. Next to
introducing privacy preserving data mining over grid-partitioned data, the main contribution of this paper is that

2
Network and Complex Systems
ISSN 2224-610X (Paper) ISSN 2225-0603 (Online)
Vol.3, No.7, 2013

www.iiste.org

we show, by means of a complexity analysis that the former evaluation method is the more efficient.
Our contribution: Here the datasets when partitioned horizontally, vertically and after that the clustering
algorithm is applied performs better performance than on the whole datasets.
In [8] This paper presents a novel host-based combinatorial method based on k-Means clustering and ID3
decision tree learning algorithms for unsupervised classification of anomalous and normal activities in computer
network ARP traffic. The k-Means clustering method is first applied to the normal training instances to partition
it into k clusters using Euclidean distance similarity. An ID3 decision tree is constructed on each cluster.
Anomaly scores from the k-Means clustering algorithm and decisions of the ID3 decision trees are extracted. A
special algorithm is used to combine results of the two algorithms and obtain final anomaly score values. The
threshold rule is applied for making decision on the test instance normality or abnormality.
Our contribution: The proposed method is compared with the individual k-Means and ID3 methods and the other
proposed approaches based on markovian chains and stochastic learning automata in terms of the overall
classification performance defined over five different performance measures. Results on real evaluation test bed
network data sets show that: the proposed method outperforms the individual k- Means and the ID3 compared to
the other approaches.
In [9] Traditionally, research on graph theory focused on studying graphs that are static. However, almost all real
networks are dynamic in nature and large in size. Quite recently, research areas for studying the topology,
evolution, applications of complex evolving networks and processes occurring in them and governing them
attracted attention from researchers. In this work, we review the significant contributions in the literature on
complex evolving networks; metrics used from degree distribution to spectral graph analysis, real world
applications from biology to social sciences, problem domains from anomaly detection, dynamic graph
clustering to community detection.
Our contribution: Many real world complex systems can be represented as graphs. The entities in these system
represent the nodes or vertices and links or edges connect a pair or more of the nodes. We encounter such
networks in almost any application domain i.e. computer science, sociology, chemistry, biology, anthropology,
psychology, geography, history, engineering.

III.
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

Proposed ID3 algorithm
Define P1, P2, …., Pn Parties.(Horizontally partitioned).
Each Party contains R set of attributes A1, A2, …., AR.
C the class attributes contains c class values C1, C2, …., Cc.
For party Pi where i = 1 to n do
If R is Empty Then
Return a leaf node with class value
Else If all transaction in T(Pi) have the same class Then
Return a leaf node with the class value
Else
Calculate Expected Information classify the given sample for each party Pi individually.
Calculate Entropy for each attribute (A1, A2, …., AR) of each party Pi.
Calculate Information Gain for each attribute (A1, A2,…., AR) of each party Pi
Calculate Total Information Gain for each attribute of all parties (TotalInformationGain( )).
ABestAttribute
MaxInformationGain( )
Let V1, V2, …., Vm be the value of attributes. ABestAttribute partitioned P1, P2,…., Pn parties into m parties
P1(V1), P1(V2), …., P1(Vm)
P2(V1), P2(V2), …., P2(Vm)
.
.
.
.
Pn(V1), Pn(V2), …., Pn(Vm)
Return the Tree whose Root is labelled ABestAttribute and has m edges labelled V1, V2, …., Vm. Such that
for every i the edge Vi goes to the Tree
NPPID3(R – ABestAttribute, C, (P1(Vi), P2(Vi), …., Pn(Vi)))
End.

IV.
RESULTANALYSIS
As shown in Fig 1. is the time needed for the decision of any dataset? It was observerd that the existing id3takes
more time as compared our proposed work.
Where,

3
Network and Complex Systems
ISSN 2224-610X (Paper) ISSN 2225-0603 (Online)
Vol.3, No.7, 2013

www.iiste.org

HP is the proposed horizontal partioned based ID3.

Fig. 1
Relative absolute error can be calculated as:
(|p1-a1|+….+|pn-an |) / (| -a1|+………..| -an|)
Mean squared error can be calculated as:
((p1-a1)2+...+(pn-an )2) / (( -a1)2+...( -an)2)
with Actual target values: a1 a2 … an
Predicted target values: p1 p2 … pn

Fig. 2
In Fig 2. Our proposed work performs less means square error as compared to the existing algorithm.

4
Network and Complex Systems
ISSN 2224-610X (Paper) ISSN 2225-0603 (Online)
Vol.3, No.7, 2013

www.iiste.org

Fig. 3
In Fig 3. It was observed that the proposed algorithm has less absolute error than the existing algorithm.
Clustering with proposed id3

Time (ms)

K-mean with proposed id3
Hierarchical with proposed id3
EM with proposed id3

Mean
error
0.0714
0.0357
0.0238

47
31
43

absolute

Mean absolute error
14.2105 %
36.5854 %
5.4119 %

Table 1.
As shown in the table 1 is the comparative study of different clustering algorithm with our proposed algorithm.
Clustering with existing id3
Time (ms)
Mean
absolute Mean absolute error
error
K-mean with existing id3
65
0.0914
20.2105 %
Hierarchical with existing id3
50
0.0557
45.5854 %
EM with existing id3
60
0.0438
7.4119 %
Table 2.

V.

CONCLUSION
The clustering algorithms are used to divide any datasets into a number of clusters, this time clustering
algorithms are combined with ID3 algorithm to detect the network anomaly detection and the performance is
compared with the other clustering algorithms. The proposed algorithm implemented here provides a way of
classifying and provides better leaning of the network anomalies and normal activities in computer network ARP
traffic.
REFERENCES
[1] A comparative Study of Anomaly Detection Schemes in Network Intrusion Detection, A. Lazarevic, L. Ertoz,
V. Kumar, A. Ozgur, J. Srivastava.
[2] Keogh E., Chakrabarti K., Pazzani M., and Mehrotra S., “Dimensionality Reduction for Fast Similarity
Search in Large Time Series Databases,” Knowledge and Information Systems, vol. 3, pp. 263-286, 2001.
[3] Lepere R. and Trystram D., “A New Clustering Algorithm for Large Communication Delays,” in
Proceedings of 16th IEEE-ACM Annual International Parallel and Distributed Processing Symposium
(IPDPS’02), Fort Lauderdale, USA, 2002.
[4] Li C. and Biswas G., “Unsupervised Learning with Mixed Numeric and Nominal Data,” IEEE
Transactions on Knowledge and Data Engineering, vol. 14, no. 4, pp. 673-690, 2002.
[5] A comparison of clustering method for unsupervised anomaly detection in network traffic, Koffi Bruno Yao.
[6] Comparisons between Data Clustering Algorithms,Osama Abu Abbas Computer Science Department,
Yarmouk University, Jordan
[7] Privacy Preserving ID3 over Horizontally, Vertically and Grid Partitioned Data,Bart Kuijpers, Vanessa
Lemmens, Bart Moelans Theoretical Computer Science, Hasselt University & Transnational University Limburg,
Belgium.
[8] A Novel Unsupervised Classification Approach for Network Anomaly Detection by K Means Clustering and

5
Network and Complex Systems
ISSN 2224-610X (Paper) ISSN 2225-0603 (Online)
Vol.3, No.7, 2013

www.iiste.org

ID3 Decision Tree Learning Methods,Yasser Yasami, Saadat Pour Mozaffari, Computer Engineering
Department Amirkabir University of Technology (AUT) Tehran, Iran.
[9] Dynamic Network Evolution: Models, Clustering, Anomaly Detection,Cemal Cagatay Bilgin and B¨ulent
Yener Rensselaer Polytechnic Institute, Troy NY, 12180.,
[10] Wenke Lee and S. J. Stolfo. Data Mining Approaches for Intrusion Detection, 1998.
[11] Stefano Zanero and Sergio M. Savaresi. Unsupervised learning techniques for an intrusion detection system,
ACM March 2004.
[12] Martin Ester, Hans-Peter Kriegel,Jorg Sander,Xiaowei Xu. A density-based clustering algorithm for
discovering Clusters in Large Spatial databases with noise. Proceedings of 2nd international Conference on
Knowledge Discovery and Data Mining, 1996.
[13] A. Wespi, G. Vigna and L.Deri. Recent Advances in Intrusion Detection. 5th International Symposium,
Raid 2002 Zurich, Switzerland, October 2002 Proceedings. Springer.
[14] G. Qu, S. Hariri, and M. Yousif, “A New Dependency and Correlation Analysis for Features,” IEEE Trans.
Knowledge and Data Eng., vol. 17, no. 9, pp. 1199-1207, Sept. 2005.
[15] J. Kittler, M. Hatef, R.P.W. Duin, and J. Matas, “On Combining Classifiers,” IEEE Trans. Pattern Analysis
and Machine Intelligence, vol. 20, no. 3, pp. 226-239, Mar. 1998.

6
This academic article was published by The International Institute for Science,
Technology and Education (IISTE). The IISTE is a pioneer in the Open Access
Publishing service based in the U.S. and Europe. The aim of the institute is
Accelerating Global Knowledge Sharing.
More information about the publisher can be found in the IISTE’s homepage:
http://www.iiste.org
CALL FOR JOURNAL PAPERS
The IISTE is currently hosting more than 30 peer-reviewed academic journals and
collaborating with academic institutions around the world. There’s no deadline for
submission. Prospective authors of IISTE journals can find the submission
instruction on the following page: http://www.iiste.org/journals/
The IISTE
editorial team promises to the review and publish all the qualified submissions in a
fast manner. All the journals articles are available online to the readers all over the
world without financial, legal, or technical barriers other than those inseparable from
gaining access to the internet itself. Printed version of the journals is also available
upon request of readers and authors.
MORE RESOURCES
Book publication information: http://www.iiste.org/book/
Recent conferences: http://www.iiste.org/conference/
IISTE Knowledge Sharing Partners
EBSCO, Index Copernicus, Ulrich's Periodicals Directory, JournalTOCS, PKP Open
Archives Harvester, Bielefeld Academic Search Engine, Elektronische
Zeitschriftenbibliothek EZB, Open J-Gate, OCLC WorldCat, Universe Digtial
Library , NewJour, Google Scholar

Más contenido relacionado

La actualidad más candente

Ppback propagation-bansal-zhong-2010
Ppback propagation-bansal-zhong-2010Ppback propagation-bansal-zhong-2010
Ppback propagation-bansal-zhong-2010
girivaishali
 
A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...
IJERA Editor
 
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and PrivacyA Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
ijsrd.com
 

La actualidad más candente (18)

Trending Topics in Machine Learning
Trending Topics in Machine LearningTrending Topics in Machine Learning
Trending Topics in Machine Learning
 
Ppback propagation-bansal-zhong-2010
Ppback propagation-bansal-zhong-2010Ppback propagation-bansal-zhong-2010
Ppback propagation-bansal-zhong-2010
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATA
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATAEFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATA
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATA
 
M033059064
M033059064M033059064
M033059064
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
 
Analysis of the Datasets
Analysis of the DatasetsAnalysis of the Datasets
Analysis of the Datasets
 
Iaetsd a survey on one class clustering
Iaetsd a survey on one class clusteringIaetsd a survey on one class clustering
Iaetsd a survey on one class clustering
 
Saif_CCECE2007_full_paper_submitted
Saif_CCECE2007_full_paper_submittedSaif_CCECE2007_full_paper_submitted
Saif_CCECE2007_full_paper_submitted
 
A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...
 
ϵ-DIFFERENTIAL PRIVACY MODEL FOR VERTICALLY PARTITIONED DATA TO SECURE THE PR...
ϵ-DIFFERENTIAL PRIVACY MODEL FOR VERTICALLY PARTITIONED DATA TO SECURE THE PR...ϵ-DIFFERENTIAL PRIVACY MODEL FOR VERTICALLY PARTITIONED DATA TO SECURE THE PR...
ϵ-DIFFERENTIAL PRIVACY MODEL FOR VERTICALLY PARTITIONED DATA TO SECURE THE PR...
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environment
 
IRJET - Random Data Perturbation Techniques in Privacy Preserving Data Mi...
IRJET -  	  Random Data Perturbation Techniques in Privacy Preserving Data Mi...IRJET -  	  Random Data Perturbation Techniques in Privacy Preserving Data Mi...
IRJET - Random Data Perturbation Techniques in Privacy Preserving Data Mi...
 
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and PrivacyA Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
 

Destacado

ανταρκτική! Ergasia wikispaces! 123
ανταρκτική! Ergasia wikispaces! 123ανταρκτική! Ergasia wikispaces! 123
ανταρκτική! Ergasia wikispaces! 123
gkazakias123
 

Destacado (13)

A post market reform analysis of monetary conditions index for nigeria
A post market reform analysis of monetary conditions index for nigeriaA post market reform analysis of monetary conditions index for nigeria
A post market reform analysis of monetary conditions index for nigeria
 
A study of social development of children at elementary level
A study of social development of children at elementary levelA study of social development of children at elementary level
A study of social development of children at elementary level
 
ανταρκτική! Ergasia wikispaces! 123
ανταρκτική! Ergasia wikispaces! 123ανταρκτική! Ergasia wikispaces! 123
ανταρκτική! Ergasia wikispaces! 123
 
A pilot study on effects of vaccination on immunity of broiler chickens
A pilot study on effects of vaccination on immunity of broiler chickensA pilot study on effects of vaccination on immunity of broiler chickens
A pilot study on effects of vaccination on immunity of broiler chickens
 
Analysis template (1)
Analysis template (1)Analysis template (1)
Analysis template (1)
 
Presentation / Keynote for The Aalborg University Teaching Day 2015
Presentation / Keynote for The Aalborg University Teaching Day 2015Presentation / Keynote for The Aalborg University Teaching Day 2015
Presentation / Keynote for The Aalborg University Teaching Day 2015
 
Odc
OdcOdc
Odc
 
A multifractality measure of stock market efficiency in asean region
A multifractality measure of stock market efficiency in asean regionA multifractality measure of stock market efficiency in asean region
A multifractality measure of stock market efficiency in asean region
 
Evaluation question 7
Evaluation question 7Evaluation question 7
Evaluation question 7
 
A short history of evolution of indigenous plants and medicine system
A short history of evolution of indigenous plants and medicine system A short history of evolution of indigenous plants and medicine system
A short history of evolution of indigenous plants and medicine system
 
A preliminary yield model for natural yushania alpina bamboo in kenya
A preliminary yield model for natural yushania alpina bamboo in kenyaA preliminary yield model for natural yushania alpina bamboo in kenya
A preliminary yield model for natural yushania alpina bamboo in kenya
 
Gestores de contenido
Gestores de contenidoGestores de contenido
Gestores de contenido
 
Санкции поисковых систем: диагностика и снятие
Санкции поисковых систем: диагностика и снятиеСанкции поисковых систем: диагностика и снятие
Санкции поисковых систем: диагностика и снятие
 

Similar a A new approach of detecting network anomalies using improved id3 with horizontal partioning based decision tree

A Study on Genetic-Fuzzy Based Automatic Intrusion Detection on Network Datasets
A Study on Genetic-Fuzzy Based Automatic Intrusion Detection on Network DatasetsA Study on Genetic-Fuzzy Based Automatic Intrusion Detection on Network Datasets
A Study on Genetic-Fuzzy Based Automatic Intrusion Detection on Network Datasets
Drjabez
 
Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194
Editor IJARCET
 
Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194
Editor IJARCET
 
Bs31267274
Bs31267274Bs31267274
Bs31267274
IJMER
 

Similar a A new approach of detecting network anomalies using improved id3 with horizontal partioning based decision tree (20)

Ij2514951500
Ij2514951500Ij2514951500
Ij2514951500
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & Prediction
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
 
A Study on Genetic-Fuzzy Based Automatic Intrusion Detection on Network Datasets
A Study on Genetic-Fuzzy Based Automatic Intrusion Detection on Network DatasetsA Study on Genetic-Fuzzy Based Automatic Intrusion Detection on Network Datasets
A Study on Genetic-Fuzzy Based Automatic Intrusion Detection on Network Datasets
 
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...A Novel Classification via Clustering Method for Anomaly Based Network Intrus...
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...
 
Artificial Neural Content Techniques for Enhanced Intrusion Detection and Pre...
Artificial Neural Content Techniques for Enhanced Intrusion Detection and Pre...Artificial Neural Content Techniques for Enhanced Intrusion Detection and Pre...
Artificial Neural Content Techniques for Enhanced Intrusion Detection and Pre...
 
BENCHMARKS FOR EVALUATING ANOMALY-BASED INTRUSION DETECTION SOLUTIONS
BENCHMARKS FOR EVALUATING ANOMALY-BASED INTRUSION DETECTION SOLUTIONSBENCHMARKS FOR EVALUATING ANOMALY-BASED INTRUSION DETECTION SOLUTIONS
BENCHMARKS FOR EVALUATING ANOMALY-BASED INTRUSION DETECTION SOLUTIONS
 
Benchmarks for Evaluating Anomaly Based Intrusion Detection Solutions
Benchmarks for Evaluating Anomaly Based Intrusion Detection SolutionsBenchmarks for Evaluating Anomaly Based Intrusion Detection Solutions
Benchmarks for Evaluating Anomaly Based Intrusion Detection Solutions
 
Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194
 
Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194
 
A Heuristic Approach for Network Data Clustering
A Heuristic Approach for Network Data ClusteringA Heuristic Approach for Network Data Clustering
A Heuristic Approach for Network Data Clustering
 
E017153342
E017153342E017153342
E017153342
 
Data mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant AnalysisData mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant Analysis
 
Bs31267274
Bs31267274Bs31267274
Bs31267274
 
Spam email filtering
Spam email filteringSpam email filtering
Spam email filtering
 
Final proj 2 (1)
Final proj 2 (1)Final proj 2 (1)
Final proj 2 (1)
 
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient TechniquesData Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
 
Configuring Associations to Increase Trust in Product Purchase
Configuring Associations to Increase Trust in Product Purchase Configuring Associations to Increase Trust in Product Purchase
Configuring Associations to Increase Trust in Product Purchase
 
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASECONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 

Más de Alexander Decker

Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...
Alexander Decker
 
A usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websitesA usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websites
Alexander Decker
 
A universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banksA universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banks
Alexander Decker
 
A unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized dA unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized d
Alexander Decker
 
A trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistanceA trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistance
Alexander Decker
 
A transformational generative approach towards understanding al-istifham
A transformational  generative approach towards understanding al-istifhamA transformational  generative approach towards understanding al-istifham
A transformational generative approach towards understanding al-istifham
Alexander Decker
 
A time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibiaA time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibia
Alexander Decker
 
A therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school childrenA therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school children
Alexander Decker
 
A theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banksA theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banks
Alexander Decker
 
A systematic evaluation of link budget for
A systematic evaluation of link budget forA systematic evaluation of link budget for
A systematic evaluation of link budget for
Alexander Decker
 
A synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjabA synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjab
Alexander Decker
 
A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...
Alexander Decker
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
A survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniquesA survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniques
Alexander Decker
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
Alexander Decker
 
A survey on challenges to the media cloud
A survey on challenges to the media cloudA survey on challenges to the media cloud
A survey on challenges to the media cloud
Alexander Decker
 
A survey of provenance leveraged
A survey of provenance leveragedA survey of provenance leveraged
A survey of provenance leveraged
Alexander Decker
 
A survey of private equity investments in kenya
A survey of private equity investments in kenyaA survey of private equity investments in kenya
A survey of private equity investments in kenya
Alexander Decker
 
A study to measures the financial health of
A study to measures the financial health ofA study to measures the financial health of
A study to measures the financial health of
Alexander Decker
 

Más de Alexander Decker (20)

Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...Abnormalities of hormones and inflammatory cytokines in women affected with p...
Abnormalities of hormones and inflammatory cytokines in women affected with p...
 
A validation of the adverse childhood experiences scale in
A validation of the adverse childhood experiences scale inA validation of the adverse childhood experiences scale in
A validation of the adverse childhood experiences scale in
 
A usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websitesA usability evaluation framework for b2 c e commerce websites
A usability evaluation framework for b2 c e commerce websites
 
A universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banksA universal model for managing the marketing executives in nigerian banks
A universal model for managing the marketing executives in nigerian banks
 
A unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized dA unique common fixed point theorems in generalized d
A unique common fixed point theorems in generalized d
 
A trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistanceA trends of salmonella and antibiotic resistance
A trends of salmonella and antibiotic resistance
 
A transformational generative approach towards understanding al-istifham
A transformational  generative approach towards understanding al-istifhamA transformational  generative approach towards understanding al-istifham
A transformational generative approach towards understanding al-istifham
 
A time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibiaA time series analysis of the determinants of savings in namibia
A time series analysis of the determinants of savings in namibia
 
A therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school childrenA therapy for physical and mental fitness of school children
A therapy for physical and mental fitness of school children
 
A theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banksA theory of efficiency for managing the marketing executives in nigerian banks
A theory of efficiency for managing the marketing executives in nigerian banks
 
A systematic evaluation of link budget for
A systematic evaluation of link budget forA systematic evaluation of link budget for
A systematic evaluation of link budget for
 
A synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjabA synthetic review of contraceptive supplies in punjab
A synthetic review of contraceptive supplies in punjab
 
A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...A synthesis of taylor’s and fayol’s management approaches for managing market...
A synthesis of taylor’s and fayol’s management approaches for managing market...
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incrementalA survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incremental
 
A survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniquesA survey on live virtual machine migrations and its techniques
A survey on live virtual machine migrations and its techniques
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
 
A survey on challenges to the media cloud
A survey on challenges to the media cloudA survey on challenges to the media cloud
A survey on challenges to the media cloud
 
A survey of provenance leveraged
A survey of provenance leveragedA survey of provenance leveraged
A survey of provenance leveraged
 
A survey of private equity investments in kenya
A survey of private equity investments in kenyaA survey of private equity investments in kenya
A survey of private equity investments in kenya
 
A study to measures the financial health of
A study to measures the financial health ofA study to measures the financial health of
A study to measures the financial health of
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

A new approach of detecting network anomalies using improved id3 with horizontal partioning based decision tree

  • 1. Network and Complex Systems ISSN 2224-610X (Paper) ISSN 2225-0603 (Online) Vol.3, No.7, 2013 www.iiste.org A New Approach of Detecting Network Anomalies using Improved ID3 with Horizontal Partioning Based Decision Tree Sonika Tiwari*, Prof. Roopali Soni** *Department of Computer Science, OIST Bhopal Email: sonikatiwari_mt@yahoo.com **Department of Computer Science, OIST Bhopal Email:roopalisoni@oriental.ac.in Abstract In this paper we are proposing a new approach of Detecting Network Anomalies using improved ID3 with horizontal portioning based decision tree. Here we first apply different clustering algorithms and after that we apply horizontal partioning decision tree and then check the network anomalies from the decision tree. Here in this paper we find the comparative analysis of different clustering algorithms and existing id3 based decision tree. I. INTRODUCTION It is important to keep our computer systems secure because economical activities rely on it. Despite the existence of attack prevention mechanisms such as firewalls, most company computer networks are still the victim of attacks. According to the statistics of CERT [1], the number of reported incidents against computer networks has increased from 252 in 1990 to 21756 in 2000 and to 137529 in 2003. This happened because of misconfiguration of firewalls or because malicious activities are generally cleverly designed to circumvent the firewall policies. It is therefore crucial to have another line of defence in order to detect and stop malicious activities. This line of defence is intrusion detection systems (IDS). During the last decades, different approaches to intrusion detection have been explored. The two most common approaches are misuse detection and anomaly detection. In misuse detection, attacks are detected by matching the current traffic pattern with the signature of known attacks. Anomaly detection keeps a profile of normal system behavior and interprets any significant deviation from this normal profile as malicious activity. One of the strengths of anomaly detection is the ability to detect new attacks. Anomaly detection’s most serious weakness is that it generates too many false alarms. Anomaly detection falls into two categories: supervised anomaly detection and unsupervised anomaly detection. In supervised anomaly detection, the instances of the data set used for training the system are labelled either as normal or as specific attack type. The problem with this approach is that labeling the data is time consuming. Unsupervised anomaly detection, on the other hand, operates on unlabeled data. The advantage of using unlabeled data is that the unlabeled data is easy and inexpensive to obtain. The main challenge in performing unsupervised anomaly detection is distinguishing the normal data patterns from attack data patterns. Recently, clustering has been investigated as one approach to solving this problem. As attack data patterns are assumed to differ from normal data patterns, clustering can be used to distinguish attack data patterns from normal data patterns. Clustering network traffic data is difficult because: 1. of high data volume 2. of high data dimension 3. the distribution of attack and normal classes is skewed 4. the data is a mixture of categorical and continuous data 5. of the pre-processing of the data required. Network anomaly detection As we explained earlier, detectors need models or rules for detecting intrusions. These models can be built offline on the basis of earlier network traffic data gathered by agents. Once the model has been built, the task of detecting and stopping intrusions can be performed online. One of the weaknesses of this approach is that it is not adaptive. This is because small changes in traffic affect the model globally. Some approaches to anomaly detection perform the model construction and anomaly detection simultaneously on-line. In some of these approaches clustering has been used. One of the advantages of online modeling is that it is less time consuming because it does not require a separate training phase. Furthermore, the model reflects the current nature of network traffic. The problem with this approach is that it can lead to inaccurate models. This happens because this approach fails to detect attacks performed systematically over a long period of time. These types of attacks can only be detected by analyzing network traffic gathered over a long period of time. The clusters obtained by clustering network traffic data off-line can be used for either anomaly detection or misuse detection. For anomaly detection, it is the clusters formed by the normal data that are relevant for model construction. For misuse detection, it is the different attack clusters that are used for model construction. 1
  • 2. Network and Complex Systems ISSN 2224-610X (Paper) ISSN 2225-0603 (Online) Vol.3, No.7, 2013 www.iiste.org Clustering is a division of data into groups of similar objects. Each group, called cluster, consists of objects that are similar amongst them and dissimilar compared to objects of other groups. Representing data by fewer clusters necessarily loses certain fine details, but achieves simplification. It represents many data objects by few clusters, and hence, it models data by its clusters. Cluster analysis is the organization of a collection of patterns (usually represented as a vector of measurements, or a point in a multidimensional space) into clusters based on similarity. Patterns within a valid cluster are more similar to each other than they are to a pattern belonging to a different cluster. It is important to understand the difference between clustering (unsupervised classification) and discriminate analysis (supervised classification). In supervised classification, we are provided with a collection of labelled (reclassified) patterns; the problem is to label a newly encountered, yet unlabeled, pattern. Typically, the given labeled (training) patterns are used to learn the descriptions of classes which in turn are used to label a new pattern. In the case of clustering, the problem is to group a given collection of unlabeled patterns into meaningful clusters. In a sense, labels are associated with clusters also, but these category labels are data driven; that is, they are obtained solely from the data [2,3,4]. ID3 Algorithm The ID3 algorithm (Inducing Decision Trees) was originally introduced by Quinlan in [11] and is described below in Algorithm 1. Here we briefly recall the steps involved in the algorithm. For a thorough discussion of the algorithm we refer the interested reader to [10]. Require: R, a set of attributes. Require: C, the class attribute. Require: S, data set of tuples. 1: if R is empty then 2: Return the leaf having the most frequent value in data set S. 3: else if all tuples in S have the same class value then 4: Return a leaf with that specific class value. 5: else 6: Determine attribute A with the highest information gain in S. 7: Partition S in m parts S(a1), ..., S(am) such that a1, ..., am are the different values of A. 8: Return a tree with root A and m branches labeled a1...am, such that branch i contains ID3(R − {A} ,C, S(ai)). 9: end if II. RELATED WORK In [5] Network anomaly detection aims at detecting malicious activities in computer network traffic data. In this approach, the normal profile of the network traffic is modeled and any significant deviation from this normal profile is interpreted as malicious. While supervised anomaly detection models the normal traffic behavior on the basis of an attack free data set, unsupervised anomaly detection works on a data set which contains both normal and attack data. Clustering has recently been investigated as one way of approaching the issues of unsupervised anomaly detection. Our contribution: The main goal of the paper has been to investigate the efficiency of different classical clustering algorithms in clustering network traffic data for unsupervised anomaly detection. The clusters obtained by clustering the network traffic data set are intended to be used by a security expert for manual labeling. A second goal has been to study some possible ways of combining these algorithms in order to improve their performance. In [6] Clustering is a division of data into group s of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar compared to objects of other groups. This paper is intended to study and compare different data clustering algorithms. The algorithms under investigation are: k-means algorithm, hierarchical clustering algorithm, self-organizing maps algorithm, and expectation maximization clustering algorithm. All these algorithms are compared according to the following factors: size of dataset, number of clusters, type of dataset and type of software used. Our contribution: The main conclusion that can be concluded is the performance comparison of different clustering algorithm. In [7] This consider privacy preserving decision tree induction via ID3 in the case where the training data is horizontally or vertically distributed. Furthermore, we consider the same problem in the case where the data is both horizontally and vertically distributed, a situation we refer to as grid partitioned data. We give an algorithm for privacy preserving ID3 over horizontally partitioned data involving more than two parties. For grid partitioned data, we discuss two different evaluation methods for preserving privacy ID3, namely, first merging horizontally and developing vertically or first merging vertically and next developing horizontally. Next to introducing privacy preserving data mining over grid-partitioned data, the main contribution of this paper is that 2
  • 3. Network and Complex Systems ISSN 2224-610X (Paper) ISSN 2225-0603 (Online) Vol.3, No.7, 2013 www.iiste.org we show, by means of a complexity analysis that the former evaluation method is the more efficient. Our contribution: Here the datasets when partitioned horizontally, vertically and after that the clustering algorithm is applied performs better performance than on the whole datasets. In [8] This paper presents a novel host-based combinatorial method based on k-Means clustering and ID3 decision tree learning algorithms for unsupervised classification of anomalous and normal activities in computer network ARP traffic. The k-Means clustering method is first applied to the normal training instances to partition it into k clusters using Euclidean distance similarity. An ID3 decision tree is constructed on each cluster. Anomaly scores from the k-Means clustering algorithm and decisions of the ID3 decision trees are extracted. A special algorithm is used to combine results of the two algorithms and obtain final anomaly score values. The threshold rule is applied for making decision on the test instance normality or abnormality. Our contribution: The proposed method is compared with the individual k-Means and ID3 methods and the other proposed approaches based on markovian chains and stochastic learning automata in terms of the overall classification performance defined over five different performance measures. Results on real evaluation test bed network data sets show that: the proposed method outperforms the individual k- Means and the ID3 compared to the other approaches. In [9] Traditionally, research on graph theory focused on studying graphs that are static. However, almost all real networks are dynamic in nature and large in size. Quite recently, research areas for studying the topology, evolution, applications of complex evolving networks and processes occurring in them and governing them attracted attention from researchers. In this work, we review the significant contributions in the literature on complex evolving networks; metrics used from degree distribution to spectral graph analysis, real world applications from biology to social sciences, problem domains from anomaly detection, dynamic graph clustering to community detection. Our contribution: Many real world complex systems can be represented as graphs. The entities in these system represent the nodes or vertices and links or edges connect a pair or more of the nodes. We encounter such networks in almost any application domain i.e. computer science, sociology, chemistry, biology, anthropology, psychology, geography, history, engineering. III. • • • • • • • • • • • • • • • • • • • • • • • Proposed ID3 algorithm Define P1, P2, …., Pn Parties.(Horizontally partitioned). Each Party contains R set of attributes A1, A2, …., AR. C the class attributes contains c class values C1, C2, …., Cc. For party Pi where i = 1 to n do If R is Empty Then Return a leaf node with class value Else If all transaction in T(Pi) have the same class Then Return a leaf node with the class value Else Calculate Expected Information classify the given sample for each party Pi individually. Calculate Entropy for each attribute (A1, A2, …., AR) of each party Pi. Calculate Information Gain for each attribute (A1, A2,…., AR) of each party Pi Calculate Total Information Gain for each attribute of all parties (TotalInformationGain( )). ABestAttribute MaxInformationGain( ) Let V1, V2, …., Vm be the value of attributes. ABestAttribute partitioned P1, P2,…., Pn parties into m parties P1(V1), P1(V2), …., P1(Vm) P2(V1), P2(V2), …., P2(Vm) . . . . Pn(V1), Pn(V2), …., Pn(Vm) Return the Tree whose Root is labelled ABestAttribute and has m edges labelled V1, V2, …., Vm. Such that for every i the edge Vi goes to the Tree NPPID3(R – ABestAttribute, C, (P1(Vi), P2(Vi), …., Pn(Vi))) End. IV. RESULTANALYSIS As shown in Fig 1. is the time needed for the decision of any dataset? It was observerd that the existing id3takes more time as compared our proposed work. Where, 3
  • 4. Network and Complex Systems ISSN 2224-610X (Paper) ISSN 2225-0603 (Online) Vol.3, No.7, 2013 www.iiste.org HP is the proposed horizontal partioned based ID3. Fig. 1 Relative absolute error can be calculated as: (|p1-a1|+….+|pn-an |) / (| -a1|+………..| -an|) Mean squared error can be calculated as: ((p1-a1)2+...+(pn-an )2) / (( -a1)2+...( -an)2) with Actual target values: a1 a2 … an Predicted target values: p1 p2 … pn Fig. 2 In Fig 2. Our proposed work performs less means square error as compared to the existing algorithm. 4
  • 5. Network and Complex Systems ISSN 2224-610X (Paper) ISSN 2225-0603 (Online) Vol.3, No.7, 2013 www.iiste.org Fig. 3 In Fig 3. It was observed that the proposed algorithm has less absolute error than the existing algorithm. Clustering with proposed id3 Time (ms) K-mean with proposed id3 Hierarchical with proposed id3 EM with proposed id3 Mean error 0.0714 0.0357 0.0238 47 31 43 absolute Mean absolute error 14.2105 % 36.5854 % 5.4119 % Table 1. As shown in the table 1 is the comparative study of different clustering algorithm with our proposed algorithm. Clustering with existing id3 Time (ms) Mean absolute Mean absolute error error K-mean with existing id3 65 0.0914 20.2105 % Hierarchical with existing id3 50 0.0557 45.5854 % EM with existing id3 60 0.0438 7.4119 % Table 2. V. CONCLUSION The clustering algorithms are used to divide any datasets into a number of clusters, this time clustering algorithms are combined with ID3 algorithm to detect the network anomaly detection and the performance is compared with the other clustering algorithms. The proposed algorithm implemented here provides a way of classifying and provides better leaning of the network anomalies and normal activities in computer network ARP traffic. REFERENCES [1] A comparative Study of Anomaly Detection Schemes in Network Intrusion Detection, A. Lazarevic, L. Ertoz, V. Kumar, A. Ozgur, J. Srivastava. [2] Keogh E., Chakrabarti K., Pazzani M., and Mehrotra S., “Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases,” Knowledge and Information Systems, vol. 3, pp. 263-286, 2001. [3] Lepere R. and Trystram D., “A New Clustering Algorithm for Large Communication Delays,” in Proceedings of 16th IEEE-ACM Annual International Parallel and Distributed Processing Symposium (IPDPS’02), Fort Lauderdale, USA, 2002. [4] Li C. and Biswas G., “Unsupervised Learning with Mixed Numeric and Nominal Data,” IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 4, pp. 673-690, 2002. [5] A comparison of clustering method for unsupervised anomaly detection in network traffic, Koffi Bruno Yao. [6] Comparisons between Data Clustering Algorithms,Osama Abu Abbas Computer Science Department, Yarmouk University, Jordan [7] Privacy Preserving ID3 over Horizontally, Vertically and Grid Partitioned Data,Bart Kuijpers, Vanessa Lemmens, Bart Moelans Theoretical Computer Science, Hasselt University & Transnational University Limburg, Belgium. [8] A Novel Unsupervised Classification Approach for Network Anomaly Detection by K Means Clustering and 5
  • 6. Network and Complex Systems ISSN 2224-610X (Paper) ISSN 2225-0603 (Online) Vol.3, No.7, 2013 www.iiste.org ID3 Decision Tree Learning Methods,Yasser Yasami, Saadat Pour Mozaffari, Computer Engineering Department Amirkabir University of Technology (AUT) Tehran, Iran. [9] Dynamic Network Evolution: Models, Clustering, Anomaly Detection,Cemal Cagatay Bilgin and B¨ulent Yener Rensselaer Polytechnic Institute, Troy NY, 12180., [10] Wenke Lee and S. J. Stolfo. Data Mining Approaches for Intrusion Detection, 1998. [11] Stefano Zanero and Sergio M. Savaresi. Unsupervised learning techniques for an intrusion detection system, ACM March 2004. [12] Martin Ester, Hans-Peter Kriegel,Jorg Sander,Xiaowei Xu. A density-based clustering algorithm for discovering Clusters in Large Spatial databases with noise. Proceedings of 2nd international Conference on Knowledge Discovery and Data Mining, 1996. [13] A. Wespi, G. Vigna and L.Deri. Recent Advances in Intrusion Detection. 5th International Symposium, Raid 2002 Zurich, Switzerland, October 2002 Proceedings. Springer. [14] G. Qu, S. Hariri, and M. Yousif, “A New Dependency and Correlation Analysis for Features,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 9, pp. 1199-1207, Sept. 2005. [15] J. Kittler, M. Hatef, R.P.W. Duin, and J. Matas, “On Combining Classifiers,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226-239, Mar. 1998. 6
  • 7. This academic article was published by The International Institute for Science, Technology and Education (IISTE). The IISTE is a pioneer in the Open Access Publishing service based in the U.S. and Europe. The aim of the institute is Accelerating Global Knowledge Sharing. More information about the publisher can be found in the IISTE’s homepage: http://www.iiste.org CALL FOR JOURNAL PAPERS The IISTE is currently hosting more than 30 peer-reviewed academic journals and collaborating with academic institutions around the world. There’s no deadline for submission. Prospective authors of IISTE journals can find the submission instruction on the following page: http://www.iiste.org/journals/ The IISTE editorial team promises to the review and publish all the qualified submissions in a fast manner. All the journals articles are available online to the readers all over the world without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. Printed version of the journals is also available upon request of readers and authors. MORE RESOURCES Book publication information: http://www.iiste.org/book/ Recent conferences: http://www.iiste.org/conference/ IISTE Knowledge Sharing Partners EBSCO, Index Copernicus, Ulrich's Periodicals Directory, JournalTOCS, PKP Open Archives Harvester, Bielefeld Academic Search Engine, Elektronische Zeitschriftenbibliothek EZB, Open J-Gate, OCLC WorldCat, Universe Digtial Library , NewJour, Google Scholar