SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
DATA MINING SEMINAR




   TOPIC: DATAMINING FOR
   TELECOMMUNICATIONS


 SUBMITTED BY: MAYURI KIRAN
         ANVEKAR


 SUBMITTED TO: RINO CHERIAN


       DATE: 30/03/2012
Table of Contents
I.         ABSTRACT .............................................................................................................................................. 3
II.        INTRODUCTION ..................................................................................................................................... 3
III.          TYPES OF TELECOM DATA ................................................................................................................. 4
      1.      Call Summary Data ............................................................................................................................ 4
      2.      Network Data .................................................................................................................................... 4
      3.      Customer Data .................................................................................................................................. 4
IV.           Mining Sequential Patterns In Telecommunication Database Using Genetic Algorithm ................. 5
      1.      Sequential Patterns Mining............................................................................................................... 5
      2.      Genetic Algorithm ............................................................................................................................. 5
      3.      Mining Sequential Patterns in Telecommunication Database Using GA .......................................... 6
           3.1 Chromosome. .................................................................................................................................. 6
           3.2 Genetic Operators ........................................................................................................................... 7
           3.3 Fitness Function. ............................................................................................................................. 7
           3.4 SPT-GA Algorithm............................................................................................................................ 8
      4.      Experiment Results ........................................................................................................................... 9
V.         Discovering Structural Patterns In Telecommunications Data ........................................................... 11
VI.           CONCLUSION ................................................................................................................................... 13
VII.          REFERENCES .................................................................................................................................... 13
I.      ABSTRACT

Telecommunication companies generate a tremendous amount of data. These data include call
detail data, which describes the calls that traverse the telecommunication networks, network
data, which describes the state of the hardware and software components in the network, and
customer data, which describes the telecommunication customers. This chapter describes how
data mining can be used to uncover useful information buried within these data sets. Several data
mining applications are described and together they demonstrate that data mining can be used to
identify telecommunication fraud, improve marketing effectiveness, and identify network faults.




   II.     INTRODUCTION

The international telecommunications play an important role in message sharing and information
passing. In the last decade a dramatic change in the structure of telecommunications companies
has been taken place, from public monopolies to private companies. The quick development of
mobile telephone networks and video calling and Internet technologies has created enormous
competitive pressure on the companies sector. As new competitors arise in market, telecom need
intelligent tools to gain profit and withstand. Also, stock market expectations are huge and
investors, financial analysts need tested tools to gain information about how companies perform
financially compared to their competitors, what they are good at, who are the major competitors
are, etc. In other term, the telecom companies need to benchmark their performances against
compete trends in order to remain important role in this market. There is an enormous amount of
information about these companies financial performance that is now publicly available. This
amount greatly exceeds our capacity to analyze it; the problem is that we often lack tools to
quickly and accurately process these data.

Data mining for telecommunications companies involve the use of simple, traditional and
advanced mathematical techniques used to analyze large populations of data and deliver insights,
forecasts, explanations and predictions of how systems, customers, network and marketplace are
likely to react to different situations. The hypercompetitive nature of the telecom industry has
created a need to understand customers, to keep them, and to model effective ways to market
new products. This creates a great demand for innovation like data mining technique to help
understand the new business trends involved, catch fraudulent activities, identify
telecommunication patterns, make better use of resources and improve the quality of services.
III. TYPES OF TELECOM DATA

The Initial step in the data mining process is to understand the data. Here we discuss three main
types of telecom data. They are as follows:

   1. Call Summary Data
Every time a call is placed on a telecom network, descriptive information about the call is saved
as a call detail for future record. At a minimum, each call detail record will include the
originating and terminating phone numbers, the date and time of the call and the duration of the
call.

   2. Network Data
Telecommunication networks are extremely complex configurations of equipment, comprised of
interconnected components. Each network element is capable of generating error and status
messages, which leads to a tremendous amount of network data.

   3. Customer Data
Telecommunication companies, like other businesses have millions of customer’s. For necessity
they have to maintaining a database of information on these customers. This information will
include name, address and may include other information such as service plan, credit score,
contract information, family income and payment history.
IV. Mining Sequential Patterns in Telecommunication Database
                        Using Genetic Algorithm

Sequential pattern mining is the process of finding the relationships between occurrences of
sequential events, to find if there exists any specific order of the occurrences. The extraction of
sequential pattern is not polynomial in time of execution. The other algorithms for performing
sequential pattern mining can assure optimum solutions but they do not take into consideration
the time taken to reach such solutions. Whereas this algorithm based on genetic concepts gives a
non-optimal solution but in a reasonable time (polynomial) of execution.

   1. Sequential Patterns Mining

The goal is to find all sub sequences from the given sets of transactions; this approach is useful
when the data to be mined have some sequential nature to deal with databases that have a time-
series characteristics. Sequential Pattern can be defined as follows.

Definition: Let I ={x1...xn} be a set of items. An itemset is a non-empty subset of items, and an
itemset with k items is called k-itemset. A sequence s=(X1...Xl) is an order list of item sets, and
an item set Xi (1≤ i ≤ l) in a sequence is called a transaction. In a set of sequences, a sequence s
is maximal if s is not contained in other sequences.




   2. Genetic Algorithm

Genetic Algorithm (GA) is a part of evolutionary computing, which is a rapidly growing area of
artificial intelligence. Genetic algorithm starts with a set of solutions (represented by
chromosomes) called population. Solutions from one population are taken and used to form a
new population by mutation and crossover. This is motivated by a hope, that the new population
will be better than the old one.

Best solutions which are selected to form new solutions (offspring) are selected according to
their best fitness. This is repeated until some condition (for example number of populations or
improvement of the best solution) is satisfied. To measure the quality of a solution, fitness
function is assigned to each chromosome in the population.



       3. Mining Sequential Patterns in Telecommunication Database Using
                                 Genetic Algorithm

Database, this algorithm is called SPT-GA algorithm. Firstly, we present our chromosome
structure and encoding schema, genetic operators, and then we define the fitness assignment and
selection criteria. Finally, we give the structure of SPT-GA algorithm.

3.1 Chromosome. The used structure of GA chromosomes and how it is represented is as
follows:

3.1.1 Structure. In Telecommunication Database, country code values are used for creating the
chromosomes.

In the algorithm, chromosomes have a fixed length, and its length is equal to number of country
codes that are available in the database as in Figure 1.




3.1.2 Representation. In Genetic Algorithm, there are many alternatives to represent a
chromosome based on other problem like binary and integer representation. To decide which
representation is better to be used for Sequential Pattern rules, we should use the short; low-order
schemata are relevant to the underlying problem and relatively unrelated to schemata over other
fixed positions. Also we should select the smallest alphabet that permits a natural expression of
the problem, presented in [8].

 In SPT-GA algorithm, we choose the binary representation because it is the most suitable for
our algorithm and it needs less space and it represents the needed information (element occurred
or not).

For example, using the Telecommunication Database in Table.1, if a sequence is equal to <249,
973, 91>, it can be represented as in Figure.2.
Additionally, as you can see in Figure.2, order cannot be extracted directly. To solve this
problem, we decided to associate the transactions sequence as a metadata with each
chromosome. For that, we use Vertical Bitmap Representation that makes SPT-GA algorithm to
take less time and space to be executed.

3.2 Genetic Operators. SPT-GA uses genetic operators to generate the offspring of the existing
population.

Genetic algorithm will send chromosomes that represented by binary string where each bit
corresponds to an element occurrence (0 or 1); the number of bits is equal to the number of
items. After encoding of the solution domain, initially many chromosome solutions are randomly
generated depending on population size.

Genetic algorithm will select the chromosome regarding to our fitness function. The major
measure that used by our algorithm is Sequential Interestingness measure (SIM).

Then crossover takes place, it selects genes from parent chromosomes and creates a new
offspring. The simplest way how to do this is to choose randomly some crossover point and
everything before this point copy from a first parent and then everything after a crossover point
copy from the second parent. Crossover is shown in Figure 3.

After a crossover is performed, mutation takes place. This is to prevent falling all solutions in
population into a local optimum of solved problem. Mutation changes randomly the new
offspring. For binary encoding we can switch a few randomly chosen bits from 1 to 0 or from 0
to 1. Mutation is shown in Figure 3.




3.3 Fitness Function. Fitness function method discovers sequential patterns corresponding to
the interests of users without using background knowledge. This method defines a new criterion
called the sequential interestingness measure (SIM).

Definition: The sequential interestingness measure of a rule AC is:

               SIM(AC) = min Ci ∈ C {(Confidence(A|Ci))α} × Support(AC)
Where: (α ≥ 0): is a confidence priority that represents how important the frequency of the
pattern is

Ci: is subsequence of C, it represents a condition of sequence C

i = 1 … n where n is the number of conditions in C.

The first term of the criterion evaluates that the frequencies of the sub-patterns are not frequent
while the second term evaluates that the frequency of the pattern is frequent.



3.4 SPT-GA Algorithm. The SPT-GA algorithm that was proposed is described in this section.
In Figure 4, the pseudo code of SPT-GA algorithm is presented.

ALGORITHM

Input:

Initial samples size: N

Maximum generations: G

Threshold value: T

Minimum fitness: minF

Output:

Malpractice user no lists

Begin

Step1: Initialize counter count = 0

Step2: Initial population IN of size N

Step3: For each chromosome i ∈ IN

If S1& S2 is given then

Measure the fitness F(i, S1, S2,T)

Else If S1 is given then

Measure the fitness F(i, S1, T)

Else If S2 is given then
Measure the fitness F(i, S2, T)

Else

Measure the fitness F(i, T)

Step4: Mutate and crossover P.

Step5: IF (fitness ≥ minF)

Select fittest rules from P

Step6: Set temp = temp +1

Step7: IF (t > G) then

S=P

Stop

Else

Go to Step 3

End

After the encoding of the dataset, using bitmap representation, the algorithm starts by selecting
individuals to initial population. Then the following processes are repeated until the pre-specified
maximum number of generations is achieved. The fitness values determined for each selected
individual given the rule antecedent or consequent. The fittest rules that are larger than or equal
minimum fitness in P will be selected. Giving the antecedent or consequent is not mandatory but
it will reduce the time of search and extract more desired rules.

Existing chromosomes are used in generating new ones by applying crossover and mutation
operators. Chromosomes survive based on their fitness used in the process. This way, the
interesting set is determined and the target is achieved.



   4. Experiment Results

The results of some experiments on telecommunication database is presented and analyzed as
follows. SPT-GA algorithm is written with MATLAB programming language. The user can tune
population size (N), generations (G), confidence priority (α) and minimum fitness (minF).

 A Telecommunication Database is taken from a Telecommunication Company and it has 1091
transactions and 60 country codes. The crossover probability used is 0.8, while the mutation is
0.001. The output of this experiment is a text file that includes the interesting rules that represent
the most suitable telecommunication sequences. For example, the rule of Figure 5 told us that
when country code 91 is called, that means (40%) of callers will call country code 92 afterwards.




In SPT-GA algorithm, there are four parameters that must be determined by a user: number of
generation (G), population size (N), confidence priority (α) and minimum fitness (minF). First
experiment sets N = 20, α = 1 and minF = 0 with G = [20...200]. The second experiment sets G =
20, α = 1 and minF = 0 with N = [20...200]. The two experiments were done on two days of calls
with 1091 transactions and 60 country codes.

 According to the experiment, Figure 6 shows the time, in seconds, spent by the SPT-GA
algorithm to extract the best rules while Figure 7 shows the average fitness of the final output.
Both figures shows the result related to increasing the generations and population size.




 The tests showed that when the generation increases the time will be around each other but
when the population increases the time will be increased. However, comparing the two
experiments, as shown in Figure 6, GA takes less time when increasing the generation than
increasing the population. Moreover, the tests also showed that when the generation and
population increase, the best fitness will be around.

 From the experiments, it is observed that increasing of generation will take less time than
increasing of population size. But, either ways, GA does not take a long time; it is only a matter
of seconds. In addition, increasing the generation and the population will not guarantee a large
improvement of average fitness.




       V.     Discovering Structural Patterns In Telecommunications
                                       Data

The SUBDUE system is a structural discovery tool that finds substructures in a graph-based
representation of structural databases using the minimum description length (MDL) principle.
SUBDUE discovers substructures that compress the original data and represent structural
concepts in the data. Once a substructure is discovered, the substructure is used to simplify
the data by replacing instances of the substructure with a pointer to the newly discovered
substructure. The discovered substructures allow abstraction over detailed structures in the
original data. Iteration of the substructure discovery and replacement process constructs a
hierarchical description of the structural data in terms of the discovered substructures. This
hierarchy provides varying levels of interpretation that can be accessed based on the specific
goals of the data analysis.

SUBDUE represents structural data as a labeled graph. Objects in the data map to vertices or
small sub graphs in the graph, and relationships between objects map to directed or undirected
edges in the graph. A substructure is a connected sub graph within the graphical representation.
This graphical representation serves as input to the substructure discovery system. Figure 8
shows a geometric example of such an input graph. The objects in the figure become labeled
vertices in the graph, and the relationships become labeled edges in the graph. The graphical
representation of the substructure discovered by SUBDUE from this data is also shown in Figure
8. One of the four instances of the substructure is highlighted in the input graph. An instance of a
substructure in an input graph is a set of vertices and edges from the input graph that match,
graph theoretically, to the graphical representation of the substructure.
Figure 8: Example Substructure in Graph Form

Experiment Results:

In a study upon running the SUBDUE system on the data in graph form, the six substructures
output by the system were analyzed. Figure 9 shows two of these six substructures. The
substructures in this figure show that the following patterns are common:

1. Local calls originating between 11:00am and 2:30pm with 30 to 45 minute durations and

2. Long distance calls originating between 6:00pm and 10:00pm with 5 to 20 minute durations.




                      Figure 9: Telecom pattern discovered by Subdue.
VI. CONCLUSION

Genetic Algorithm is applied to find frequent sequences in Telecommunication Database in order
to help Telecommunication companies to know the country codes that have a relation between
them. So, the telecommunication companies can estimate the countries that have a specific order
of the occurrences and give a discount on the calls to these countries. SPT-GA algorithm utilizes
the property of evolutionary algorithm that discovers best rules in a short time with meaningful
results.


   VII. REFERENCES

   1. Data Mining In Telecommunications; Gary M. Weiss
   2. Data Mining In Telecommunications And Studying Its Status In Iran Telecom
      Companies And Operator; Jamal Sophieh
   3. Data Mining And CRM In Telecommunications; D. Camilovic
   4. Genetic Algorithms; William H. Hsu
   5. A Fraud Detection Approach in Telecommunication using Cluster GA; V.Umayaparvathi
      & Dr.K.Iyakutti

Más contenido relacionado

La actualidad más candente

Steganography
SteganographySteganography
SteganographyPREMKUMAR
 
Network security & cryptography full notes
Network security & cryptography full notesNetwork security & cryptography full notes
Network security & cryptography full notesgangadhar9989166446
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperGarvit Burad
 
Presentation On Steganography
Presentation On SteganographyPresentation On Steganography
Presentation On SteganographyTeachMission
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data miningDataminingTools Inc
 
Detecting fraud with Python and machine learning
Detecting fraud with Python and machine learningDetecting fraud with Python and machine learning
Detecting fraud with Python and machine learningwgyn
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.pptneelamoberoi1030
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data miningDevakumar Jain
 
Basic Cryptography unit 4 CSS
Basic Cryptography unit 4 CSSBasic Cryptography unit 4 CSS
Basic Cryptography unit 4 CSSSURBHI SAROHA
 
Classification in Data Mining
Classification in Data MiningClassification in Data Mining
Classification in Data MiningRashmi Bhat
 
Encryption And Decryption
Encryption And DecryptionEncryption And Decryption
Encryption And DecryptionNA
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Web scraping
Web scrapingWeb scraping
Web scrapingSelecto
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
PPT on BRAIN TUMOR detection in MRI images based on IMAGE SEGMENTATION
PPT on BRAIN TUMOR detection in MRI images based on  IMAGE SEGMENTATION PPT on BRAIN TUMOR detection in MRI images based on  IMAGE SEGMENTATION
PPT on BRAIN TUMOR detection in MRI images based on IMAGE SEGMENTATION khanam22
 

La actualidad más candente (20)

Steganography
SteganographySteganography
Steganography
 
Network security & cryptography full notes
Network security & cryptography full notesNetwork security & cryptography full notes
Network security & cryptography full notes
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research Paper
 
Presentation On Steganography
Presentation On SteganographyPresentation On Steganography
Presentation On Steganography
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Detecting fraud with Python and machine learning
Detecting fraud with Python and machine learningDetecting fraud with Python and machine learning
Detecting fraud with Python and machine learning
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
Basic Cryptography unit 4 CSS
Basic Cryptography unit 4 CSSBasic Cryptography unit 4 CSS
Basic Cryptography unit 4 CSS
 
Classification in Data Mining
Classification in Data MiningClassification in Data Mining
Classification in Data Mining
 
Encryption And Decryption
Encryption And DecryptionEncryption And Decryption
Encryption And Decryption
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Web mining
Web miningWeb mining
Web mining
 
Web scraping
Web scrapingWeb scraping
Web scraping
 
Confusion and Diffusion.pptx
Confusion and Diffusion.pptxConfusion and Diffusion.pptx
Confusion and Diffusion.pptx
 
Data mining
Data miningData mining
Data mining
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 
PPT on BRAIN TUMOR detection in MRI images based on IMAGE SEGMENTATION
PPT on BRAIN TUMOR detection in MRI images based on  IMAGE SEGMENTATION PPT on BRAIN TUMOR detection in MRI images based on  IMAGE SEGMENTATION
PPT on BRAIN TUMOR detection in MRI images based on IMAGE SEGMENTATION
 
Cryptography
CryptographyCryptography
Cryptography
 

Destacado

SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan PhdSMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan PhdHealthcare consultant
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSivagowry Shathesh
 
Smart health prediction using data mining by customsoft
Smart health prediction using data mining by customsoftSmart health prediction using data mining by customsoft
Smart health prediction using data mining by customsoftCustom Soft
 
Health Prediction System - an Artificial Intelligence Project 2015
Health Prediction System - an Artificial Intelligence Project 2015Health Prediction System - an Artificial Intelligence Project 2015
Health Prediction System - an Artificial Intelligence Project 2015Maruf Abdullah (Rion)
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMamiteshg
 
Heart Disease Prediction Using Data Mining Techniques
Heart Disease Prediction Using Data Mining TechniquesHeart Disease Prediction Using Data Mining Techniques
Heart Disease Prediction Using Data Mining TechniquesIJRES Journal
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Psdot 14 using data mining techniques in heart
Psdot 14 using data mining techniques in heartPsdot 14 using data mining techniques in heart
Psdot 14 using data mining techniques in heartZTech Proje
 
Detection of heart diseases by data mining
Detection of heart diseases by data miningDetection of heart diseases by data mining
Detection of heart diseases by data miningAbheepsa Pattnaik
 
1.PPT (1.PREDICTION OF DISEASES New)
1.PPT (1.PREDICTION OF DISEASES New)1.PPT (1.PREDICTION OF DISEASES New)
1.PPT (1.PREDICTION OF DISEASES New)Jashvant Shah
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSivagowry Shathesh
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsMotaz Saad
 
Computer science seminar topics
Computer science seminar topicsComputer science seminar topics
Computer science seminar topics123seminarsonly
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-gInternet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-gMohan Kumar G
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 

Destacado (20)

SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan PhdSMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease prediction
 
Smart health prediction using data mining by customsoft
Smart health prediction using data mining by customsoftSmart health prediction using data mining by customsoft
Smart health prediction using data mining by customsoft
 
Health Prediction System - an Artificial Intelligence Project 2015
Health Prediction System - an Artificial Intelligence Project 2015Health Prediction System - an Artificial Intelligence Project 2015
Health Prediction System - an Artificial Intelligence Project 2015
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
 
Heart Disease Prediction Using Data Mining Techniques
Heart Disease Prediction Using Data Mining TechniquesHeart Disease Prediction Using Data Mining Techniques
Heart Disease Prediction Using Data Mining Techniques
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Psdot 14 using data mining techniques in heart
Psdot 14 using data mining techniques in heartPsdot 14 using data mining techniques in heart
Psdot 14 using data mining techniques in heart
 
Detection of heart diseases by data mining
Detection of heart diseases by data miningDetection of heart diseases by data mining
Detection of heart diseases by data mining
 
Data mining
Data miningData mining
Data mining
 
1.PPT (1.PREDICTION OF DISEASES New)
1.PPT (1.PREDICTION OF DISEASES New)1.PPT (1.PREDICTION OF DISEASES New)
1.PPT (1.PREDICTION OF DISEASES New)
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease prediction
 
Final ppt
Final pptFinal ppt
Final ppt
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence Tools
 
Computer science seminar topics
Computer science seminar topicsComputer science seminar topics
Computer science seminar topics
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-gInternet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 

Similar a Data mining seminar report

An Adaptive Network-Based Approach for Advanced Forecasting of Cryptocurrency...
An Adaptive Network-Based Approach for Advanced Forecasting of Cryptocurrency...An Adaptive Network-Based Approach for Advanced Forecasting of Cryptocurrency...
An Adaptive Network-Based Approach for Advanced Forecasting of Cryptocurrency...AIRCC Publishing Corporation
 
Resource scheduling algorithm
Resource scheduling algorithmResource scheduling algorithm
Resource scheduling algorithmShilpa Damor
 
M.Tech: AI and Neural Networks Assignment II
M.Tech:  AI and Neural Networks Assignment IIM.Tech:  AI and Neural Networks Assignment II
M.Tech: AI and Neural Networks Assignment IIVijayananda Mohire
 
ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066rahulsm27
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisIRJET Journal
 
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...IRJET Journal
 
Stock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_AnalysisStock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_AnalysisOktay Bahceci
 
Swarm Intelligence
Swarm IntelligenceSwarm Intelligence
Swarm Intelligenceshubham1306
 
A data mining framework for fraud detection in telecom based on MapReduce (Pr...
A data mining framework for fraud detection in telecom based on MapReduce (Pr...A data mining framework for fraud detection in telecom based on MapReduce (Pr...
A data mining framework for fraud detection in telecom based on MapReduce (Pr...Mohammed Kharma
 
SELF LEARNING REAL TIME EXPERT SYSTEM
SELF LEARNING REAL TIME EXPERT SYSTEMSELF LEARNING REAL TIME EXPERT SYSTEM
SELF LEARNING REAL TIME EXPERT SYSTEMcscpconf
 
Automation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningAutomation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningPranov Mishra
 
Analysis on Fraud Detection Mechanisms Using Machine Learning Techniques
Analysis on Fraud Detection Mechanisms Using Machine Learning TechniquesAnalysis on Fraud Detection Mechanisms Using Machine Learning Techniques
Analysis on Fraud Detection Mechanisms Using Machine Learning TechniquesIRJET Journal
 
Bitcoin Price Prediction Using LSTM
Bitcoin Price Prediction Using LSTMBitcoin Price Prediction Using LSTM
Bitcoin Price Prediction Using LSTMIRJET Journal
 
IRJET- Competitive Analysis of Attacks on Social Media
IRJET-  	 Competitive Analysis of Attacks on Social MediaIRJET-  	 Competitive Analysis of Attacks on Social Media
IRJET- Competitive Analysis of Attacks on Social MediaIRJET Journal
 
The application of process mining in a simulated smart environment to derive ...
The application of process mining in a simulated smart environment to derive ...The application of process mining in a simulated smart environment to derive ...
The application of process mining in a simulated smart environment to derive ...freedomotic
 
thesis_jinxing_lin
thesis_jinxing_linthesis_jinxing_lin
thesis_jinxing_linjinxing lin
 
AlgoB – Cryptocurrency price prediction system using LSTM
AlgoB – Cryptocurrency price prediction system using LSTMAlgoB – Cryptocurrency price prediction system using LSTM
AlgoB – Cryptocurrency price prediction system using LSTMIRJET Journal
 
Bitcoin Price Prediction and Recommendation System using Deep learning techni...
Bitcoin Price Prediction and Recommendation System using Deep learning techni...Bitcoin Price Prediction and Recommendation System using Deep learning techni...
Bitcoin Price Prediction and Recommendation System using Deep learning techni...IRJET Journal
 

Similar a Data mining seminar report (20)

An Adaptive Network-Based Approach for Advanced Forecasting of Cryptocurrency...
An Adaptive Network-Based Approach for Advanced Forecasting of Cryptocurrency...An Adaptive Network-Based Approach for Advanced Forecasting of Cryptocurrency...
An Adaptive Network-Based Approach for Advanced Forecasting of Cryptocurrency...
 
Resource scheduling algorithm
Resource scheduling algorithmResource scheduling algorithm
Resource scheduling algorithm
 
PFC
PFCPFC
PFC
 
M.Tech: AI and Neural Networks Assignment II
M.Tech:  AI and Neural Networks Assignment IIM.Tech:  AI and Neural Networks Assignment II
M.Tech: AI and Neural Networks Assignment II
 
ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend Analysis
 
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...
 
Stock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_AnalysisStock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_Analysis
 
Swarm Intelligence
Swarm IntelligenceSwarm Intelligence
Swarm Intelligence
 
A data mining framework for fraud detection in telecom based on MapReduce (Pr...
A data mining framework for fraud detection in telecom based on MapReduce (Pr...A data mining framework for fraud detection in telecom based on MapReduce (Pr...
A data mining framework for fraud detection in telecom based on MapReduce (Pr...
 
SELF LEARNING REAL TIME EXPERT SYSTEM
SELF LEARNING REAL TIME EXPERT SYSTEMSELF LEARNING REAL TIME EXPERT SYSTEM
SELF LEARNING REAL TIME EXPERT SYSTEM
 
Automation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningAutomation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep Learning
 
Final PPT.pptx
Final PPT.pptxFinal PPT.pptx
Final PPT.pptx
 
Analysis on Fraud Detection Mechanisms Using Machine Learning Techniques
Analysis on Fraud Detection Mechanisms Using Machine Learning TechniquesAnalysis on Fraud Detection Mechanisms Using Machine Learning Techniques
Analysis on Fraud Detection Mechanisms Using Machine Learning Techniques
 
Bitcoin Price Prediction Using LSTM
Bitcoin Price Prediction Using LSTMBitcoin Price Prediction Using LSTM
Bitcoin Price Prediction Using LSTM
 
IRJET- Competitive Analysis of Attacks on Social Media
IRJET-  	 Competitive Analysis of Attacks on Social MediaIRJET-  	 Competitive Analysis of Attacks on Social Media
IRJET- Competitive Analysis of Attacks on Social Media
 
The application of process mining in a simulated smart environment to derive ...
The application of process mining in a simulated smart environment to derive ...The application of process mining in a simulated smart environment to derive ...
The application of process mining in a simulated smart environment to derive ...
 
thesis_jinxing_lin
thesis_jinxing_linthesis_jinxing_lin
thesis_jinxing_lin
 
AlgoB – Cryptocurrency price prediction system using LSTM
AlgoB – Cryptocurrency price prediction system using LSTMAlgoB – Cryptocurrency price prediction system using LSTM
AlgoB – Cryptocurrency price prediction system using LSTM
 
Bitcoin Price Prediction and Recommendation System using Deep learning techni...
Bitcoin Price Prediction and Recommendation System using Deep learning techni...Bitcoin Price Prediction and Recommendation System using Deep learning techni...
Bitcoin Price Prediction and Recommendation System using Deep learning techni...
 

Último

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 

Último (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 

Data mining seminar report

  • 1. DATA MINING SEMINAR TOPIC: DATAMINING FOR TELECOMMUNICATIONS SUBMITTED BY: MAYURI KIRAN ANVEKAR SUBMITTED TO: RINO CHERIAN DATE: 30/03/2012
  • 2. Table of Contents I. ABSTRACT .............................................................................................................................................. 3 II. INTRODUCTION ..................................................................................................................................... 3 III. TYPES OF TELECOM DATA ................................................................................................................. 4 1. Call Summary Data ............................................................................................................................ 4 2. Network Data .................................................................................................................................... 4 3. Customer Data .................................................................................................................................. 4 IV. Mining Sequential Patterns In Telecommunication Database Using Genetic Algorithm ................. 5 1. Sequential Patterns Mining............................................................................................................... 5 2. Genetic Algorithm ............................................................................................................................. 5 3. Mining Sequential Patterns in Telecommunication Database Using GA .......................................... 6 3.1 Chromosome. .................................................................................................................................. 6 3.2 Genetic Operators ........................................................................................................................... 7 3.3 Fitness Function. ............................................................................................................................. 7 3.4 SPT-GA Algorithm............................................................................................................................ 8 4. Experiment Results ........................................................................................................................... 9 V. Discovering Structural Patterns In Telecommunications Data ........................................................... 11 VI. CONCLUSION ................................................................................................................................... 13 VII. REFERENCES .................................................................................................................................... 13
  • 3. I. ABSTRACT Telecommunication companies generate a tremendous amount of data. These data include call detail data, which describes the calls that traverse the telecommunication networks, network data, which describes the state of the hardware and software components in the network, and customer data, which describes the telecommunication customers. This chapter describes how data mining can be used to uncover useful information buried within these data sets. Several data mining applications are described and together they demonstrate that data mining can be used to identify telecommunication fraud, improve marketing effectiveness, and identify network faults. II. INTRODUCTION The international telecommunications play an important role in message sharing and information passing. In the last decade a dramatic change in the structure of telecommunications companies has been taken place, from public monopolies to private companies. The quick development of mobile telephone networks and video calling and Internet technologies has created enormous competitive pressure on the companies sector. As new competitors arise in market, telecom need intelligent tools to gain profit and withstand. Also, stock market expectations are huge and investors, financial analysts need tested tools to gain information about how companies perform financially compared to their competitors, what they are good at, who are the major competitors are, etc. In other term, the telecom companies need to benchmark their performances against compete trends in order to remain important role in this market. There is an enormous amount of information about these companies financial performance that is now publicly available. This amount greatly exceeds our capacity to analyze it; the problem is that we often lack tools to quickly and accurately process these data. Data mining for telecommunications companies involve the use of simple, traditional and advanced mathematical techniques used to analyze large populations of data and deliver insights, forecasts, explanations and predictions of how systems, customers, network and marketplace are likely to react to different situations. The hypercompetitive nature of the telecom industry has created a need to understand customers, to keep them, and to model effective ways to market new products. This creates a great demand for innovation like data mining technique to help understand the new business trends involved, catch fraudulent activities, identify telecommunication patterns, make better use of resources and improve the quality of services.
  • 4. III. TYPES OF TELECOM DATA The Initial step in the data mining process is to understand the data. Here we discuss three main types of telecom data. They are as follows: 1. Call Summary Data Every time a call is placed on a telecom network, descriptive information about the call is saved as a call detail for future record. At a minimum, each call detail record will include the originating and terminating phone numbers, the date and time of the call and the duration of the call. 2. Network Data Telecommunication networks are extremely complex configurations of equipment, comprised of interconnected components. Each network element is capable of generating error and status messages, which leads to a tremendous amount of network data. 3. Customer Data Telecommunication companies, like other businesses have millions of customer’s. For necessity they have to maintaining a database of information on these customers. This information will include name, address and may include other information such as service plan, credit score, contract information, family income and payment history.
  • 5. IV. Mining Sequential Patterns in Telecommunication Database Using Genetic Algorithm Sequential pattern mining is the process of finding the relationships between occurrences of sequential events, to find if there exists any specific order of the occurrences. The extraction of sequential pattern is not polynomial in time of execution. The other algorithms for performing sequential pattern mining can assure optimum solutions but they do not take into consideration the time taken to reach such solutions. Whereas this algorithm based on genetic concepts gives a non-optimal solution but in a reasonable time (polynomial) of execution. 1. Sequential Patterns Mining The goal is to find all sub sequences from the given sets of transactions; this approach is useful when the data to be mined have some sequential nature to deal with databases that have a time- series characteristics. Sequential Pattern can be defined as follows. Definition: Let I ={x1...xn} be a set of items. An itemset is a non-empty subset of items, and an itemset with k items is called k-itemset. A sequence s=(X1...Xl) is an order list of item sets, and an item set Xi (1≤ i ≤ l) in a sequence is called a transaction. In a set of sequences, a sequence s is maximal if s is not contained in other sequences. 2. Genetic Algorithm Genetic Algorithm (GA) is a part of evolutionary computing, which is a rapidly growing area of artificial intelligence. Genetic algorithm starts with a set of solutions (represented by chromosomes) called population. Solutions from one population are taken and used to form a new population by mutation and crossover. This is motivated by a hope, that the new population will be better than the old one. Best solutions which are selected to form new solutions (offspring) are selected according to their best fitness. This is repeated until some condition (for example number of populations or
  • 6. improvement of the best solution) is satisfied. To measure the quality of a solution, fitness function is assigned to each chromosome in the population. 3. Mining Sequential Patterns in Telecommunication Database Using Genetic Algorithm Database, this algorithm is called SPT-GA algorithm. Firstly, we present our chromosome structure and encoding schema, genetic operators, and then we define the fitness assignment and selection criteria. Finally, we give the structure of SPT-GA algorithm. 3.1 Chromosome. The used structure of GA chromosomes and how it is represented is as follows: 3.1.1 Structure. In Telecommunication Database, country code values are used for creating the chromosomes. In the algorithm, chromosomes have a fixed length, and its length is equal to number of country codes that are available in the database as in Figure 1. 3.1.2 Representation. In Genetic Algorithm, there are many alternatives to represent a chromosome based on other problem like binary and integer representation. To decide which representation is better to be used for Sequential Pattern rules, we should use the short; low-order schemata are relevant to the underlying problem and relatively unrelated to schemata over other fixed positions. Also we should select the smallest alphabet that permits a natural expression of the problem, presented in [8]. In SPT-GA algorithm, we choose the binary representation because it is the most suitable for our algorithm and it needs less space and it represents the needed information (element occurred or not). For example, using the Telecommunication Database in Table.1, if a sequence is equal to <249, 973, 91>, it can be represented as in Figure.2.
  • 7. Additionally, as you can see in Figure.2, order cannot be extracted directly. To solve this problem, we decided to associate the transactions sequence as a metadata with each chromosome. For that, we use Vertical Bitmap Representation that makes SPT-GA algorithm to take less time and space to be executed. 3.2 Genetic Operators. SPT-GA uses genetic operators to generate the offspring of the existing population. Genetic algorithm will send chromosomes that represented by binary string where each bit corresponds to an element occurrence (0 or 1); the number of bits is equal to the number of items. After encoding of the solution domain, initially many chromosome solutions are randomly generated depending on population size. Genetic algorithm will select the chromosome regarding to our fitness function. The major measure that used by our algorithm is Sequential Interestingness measure (SIM). Then crossover takes place, it selects genes from parent chromosomes and creates a new offspring. The simplest way how to do this is to choose randomly some crossover point and everything before this point copy from a first parent and then everything after a crossover point copy from the second parent. Crossover is shown in Figure 3. After a crossover is performed, mutation takes place. This is to prevent falling all solutions in population into a local optimum of solved problem. Mutation changes randomly the new offspring. For binary encoding we can switch a few randomly chosen bits from 1 to 0 or from 0 to 1. Mutation is shown in Figure 3. 3.3 Fitness Function. Fitness function method discovers sequential patterns corresponding to the interests of users without using background knowledge. This method defines a new criterion called the sequential interestingness measure (SIM). Definition: The sequential interestingness measure of a rule AC is: SIM(AC) = min Ci ∈ C {(Confidence(A|Ci))α} × Support(AC)
  • 8. Where: (α ≥ 0): is a confidence priority that represents how important the frequency of the pattern is Ci: is subsequence of C, it represents a condition of sequence C i = 1 … n where n is the number of conditions in C. The first term of the criterion evaluates that the frequencies of the sub-patterns are not frequent while the second term evaluates that the frequency of the pattern is frequent. 3.4 SPT-GA Algorithm. The SPT-GA algorithm that was proposed is described in this section. In Figure 4, the pseudo code of SPT-GA algorithm is presented. ALGORITHM Input: Initial samples size: N Maximum generations: G Threshold value: T Minimum fitness: minF Output: Malpractice user no lists Begin Step1: Initialize counter count = 0 Step2: Initial population IN of size N Step3: For each chromosome i ∈ IN If S1& S2 is given then Measure the fitness F(i, S1, S2,T) Else If S1 is given then Measure the fitness F(i, S1, T) Else If S2 is given then
  • 9. Measure the fitness F(i, S2, T) Else Measure the fitness F(i, T) Step4: Mutate and crossover P. Step5: IF (fitness ≥ minF) Select fittest rules from P Step6: Set temp = temp +1 Step7: IF (t > G) then S=P Stop Else Go to Step 3 End After the encoding of the dataset, using bitmap representation, the algorithm starts by selecting individuals to initial population. Then the following processes are repeated until the pre-specified maximum number of generations is achieved. The fitness values determined for each selected individual given the rule antecedent or consequent. The fittest rules that are larger than or equal minimum fitness in P will be selected. Giving the antecedent or consequent is not mandatory but it will reduce the time of search and extract more desired rules. Existing chromosomes are used in generating new ones by applying crossover and mutation operators. Chromosomes survive based on their fitness used in the process. This way, the interesting set is determined and the target is achieved. 4. Experiment Results The results of some experiments on telecommunication database is presented and analyzed as follows. SPT-GA algorithm is written with MATLAB programming language. The user can tune population size (N), generations (G), confidence priority (α) and minimum fitness (minF). A Telecommunication Database is taken from a Telecommunication Company and it has 1091 transactions and 60 country codes. The crossover probability used is 0.8, while the mutation is
  • 10. 0.001. The output of this experiment is a text file that includes the interesting rules that represent the most suitable telecommunication sequences. For example, the rule of Figure 5 told us that when country code 91 is called, that means (40%) of callers will call country code 92 afterwards. In SPT-GA algorithm, there are four parameters that must be determined by a user: number of generation (G), population size (N), confidence priority (α) and minimum fitness (minF). First experiment sets N = 20, α = 1 and minF = 0 with G = [20...200]. The second experiment sets G = 20, α = 1 and minF = 0 with N = [20...200]. The two experiments were done on two days of calls with 1091 transactions and 60 country codes. According to the experiment, Figure 6 shows the time, in seconds, spent by the SPT-GA algorithm to extract the best rules while Figure 7 shows the average fitness of the final output. Both figures shows the result related to increasing the generations and population size. The tests showed that when the generation increases the time will be around each other but when the population increases the time will be increased. However, comparing the two experiments, as shown in Figure 6, GA takes less time when increasing the generation than increasing the population. Moreover, the tests also showed that when the generation and population increase, the best fitness will be around. From the experiments, it is observed that increasing of generation will take less time than increasing of population size. But, either ways, GA does not take a long time; it is only a matter
  • 11. of seconds. In addition, increasing the generation and the population will not guarantee a large improvement of average fitness. V. Discovering Structural Patterns In Telecommunications Data The SUBDUE system is a structural discovery tool that finds substructures in a graph-based representation of structural databases using the minimum description length (MDL) principle. SUBDUE discovers substructures that compress the original data and represent structural concepts in the data. Once a substructure is discovered, the substructure is used to simplify the data by replacing instances of the substructure with a pointer to the newly discovered substructure. The discovered substructures allow abstraction over detailed structures in the original data. Iteration of the substructure discovery and replacement process constructs a hierarchical description of the structural data in terms of the discovered substructures. This hierarchy provides varying levels of interpretation that can be accessed based on the specific goals of the data analysis. SUBDUE represents structural data as a labeled graph. Objects in the data map to vertices or small sub graphs in the graph, and relationships between objects map to directed or undirected edges in the graph. A substructure is a connected sub graph within the graphical representation. This graphical representation serves as input to the substructure discovery system. Figure 8 shows a geometric example of such an input graph. The objects in the figure become labeled vertices in the graph, and the relationships become labeled edges in the graph. The graphical representation of the substructure discovered by SUBDUE from this data is also shown in Figure 8. One of the four instances of the substructure is highlighted in the input graph. An instance of a substructure in an input graph is a set of vertices and edges from the input graph that match, graph theoretically, to the graphical representation of the substructure.
  • 12. Figure 8: Example Substructure in Graph Form Experiment Results: In a study upon running the SUBDUE system on the data in graph form, the six substructures output by the system were analyzed. Figure 9 shows two of these six substructures. The substructures in this figure show that the following patterns are common: 1. Local calls originating between 11:00am and 2:30pm with 30 to 45 minute durations and 2. Long distance calls originating between 6:00pm and 10:00pm with 5 to 20 minute durations. Figure 9: Telecom pattern discovered by Subdue.
  • 13. VI. CONCLUSION Genetic Algorithm is applied to find frequent sequences in Telecommunication Database in order to help Telecommunication companies to know the country codes that have a relation between them. So, the telecommunication companies can estimate the countries that have a specific order of the occurrences and give a discount on the calls to these countries. SPT-GA algorithm utilizes the property of evolutionary algorithm that discovers best rules in a short time with meaningful results. VII. REFERENCES 1. Data Mining In Telecommunications; Gary M. Weiss 2. Data Mining In Telecommunications And Studying Its Status In Iran Telecom Companies And Operator; Jamal Sophieh 3. Data Mining And CRM In Telecommunications; D. Camilovic 4. Genetic Algorithms; William H. Hsu 5. A Fraud Detection Approach in Telecommunication using Cluster GA; V.Umayaparvathi & Dr.K.Iyakutti