SlideShare una empresa de Scribd logo
1 de 15
Descargar para leer sin conexión
Data Mining Techniques using WEKA
                      IT for Business Intelligence




                                Ankit Pandey (10BM60012)




This term paper contains a brief introduction of a powerful data mining tool WEKA along with a hands-
on guide to two data mining techniques namely Clustering (k-means) and Linear Regression using
WEKA.
IT for Business Intelligence




Data Mining Techniques using WEKA
Introduction to WEKA
WEKA (Waikato Environment for Knowledge Analysis) is a collection of state-of-the-art
machine learning algorithms and data preprocessing tools written in Java, developed at the
University of Waikato, New Zealand. It is free software that runs on almost any platform and
is available under the GNU General Public License. It has a wide range of applications in
various data mining techniques. It provides extensive support for the entire process of
experimental data mining, including preparing the input data, evaluating learning schemes
statistically, and visualizing the input data and the result of learning. The WEKA workbench
includes methods for the main data mining problems: regression, classification, clustering,
association rule mining, and attribute selection. It can be used in either of the following two
interfaces –
      Command Line Interface (CLI)
      Graphical User Interface (GUI)
The WEKA GUI Chooser appears like this –




                                               Fig.1
The buttons can be used to start the following applications –
      Explorer – Environment for exploring data with WEKA. It gives access to all the
       facilities using menu selection and form filling.
      Experimenter – It can be used to get the answer for a question: Which methods and
       parameter values work best for the given problem?




 Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur                                   2
IT for Business Intelligence



      KnowledgeFlow – Same function as explorer. Supports incremental learning. It allows
       designing configurations for streamed data processing. Incremental algorithms can be
       used to process very large datasets.
      Simple CLI – It provides a simple Command Line Interface for directly executing WEKA
       commands.
This term paper will demonstrate the following two data mining techniques using WEKA:
      Clustering (Simple K Means)
      Linear regression


Clustering using WEKA

Clustering
Clustering is a class of techniques used to classify objects or cases into relatively homogenous
groups called clusters. Objects in each cluster tend to be similar to each other and dissimilar
to objects in the other clusters. In clustering, there is no a-priori information about the group
or cluster membership for any of the objects.
There are two major types of clustering techniques viz.
      Hierarchical Clustering
      Non-Hierarchical Clustering (aka k-means Clustering)
HIERARCHICAL CLUSTERING - Some measure of distance (usually Euclidean or squared Euclidean)
is used to identify distances between all pairs of objects to be clustered. We begin with all
objects in separate clusters. Two closest objects are joined to form a cluster. This process
continues, until points join existing clusters (because they are closest to an existing cluster),
and clusters join other clusters, based on the shortest distance criterion.
NON-HIERARCHICAL (K-MEANS ) CL USTERING - We need to rationally specify the number of
clusters we want the objects to be clustered into. In this term paper, i will illustrate the
process of k-means clustering through WEKA.

Business applications of Clustering
      Segmentation of the market
      Understanding buyer behavior
      Identifying new product opportunities
      Selecting test markets and Reducing data




 Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur                                     3
IT for Business Intelligence




K-means clustering in WEKA

Example Proble m: A major Indian FMCG company wants to map the profile of its target
market in terms of lifestyle, attitudes and perceptions. The company's managers prepare, with
the help of their marketing research team, a set o f 15 statements, which they feel measure many
of the variables of interest. These 15 statements are given below. The respondent had to agree
or disagree (1 = Strongly Agree, 2 = Agree, 3 = Neither Agree nor Disagree, 4 = Disagree, 5
= Strongly Disagree) with each statement.


    1. I prefer to use e- mail rather than write a letter.
    2. I feel that quality products are always priced high.
    3. I think twice before I buy anything.
    4. Television is a major source of entertainment.
    5. A car is a necessity rather than a luxury.
    6. I prefer fast food and ready to use products.
    7. People are more health conscious today.
    8. Entry of foreign companies has increased the efficiency of Indian companies.
    9. Women are active participants in purchase decisions.
    10. I believe politicians can play a positive role.
    11. I enjoy watching movies.
    12. If I get a chance, I would like to settle abroad.
    13. I always buy branded products.
    14. I frequently go out on weekends.
    15. I prefer to pay by credit card rather than in cash.
The company wants to cluster the market based on the above attributes to facilitate itself in
effectively catering to most feasible and lucrative segment. I will describe how WEKA can be
used to do this.


For the purpose of simplification we renamed the above 15 statements as variables in csv file
clustering data i.e. “var01” through “var15”. This data file contains 1436 instances
(responses). As an input WEKA accepts few file formats including arff and csv. In this case we
are using a csv file as an input.




 Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur                                   4
IT for Business Intelligence




Steps to be followed:


   1. Select Explorer in the Weka GUI Chooser window (displayed previously).
   2. The following window will appear –




                                                  Fig.2
   3. Select “Open File” and load the csv file clustering data. After loading the file, the
       interface will be like this –




                                                  Fig.3




 Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur                               5
IT for Business Intelligence


      4. We can click on “Visualize All” to view the distribution of all variables in the sample
          population as follows:




                                                   Fig.4
  The preprocessing tasks in WEKA obviate the need to convert the data set into the standard
  spreadsheet format and convert categorical attributes to binary. The WEKA SimpleKMeans
  algorithm uses Euclidean distance measure to compute distances between instances and
  clusters.
      5. For performing clustering operation, select the tab “Cluster” in the explorer window.




                                                   Fig.5
[All the figures from Fig. 4 onwards can be viewed more clearly in a separate window by clicking over them]




    Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur                                          6
IT for Business Intelligence



 6. In the Clusterer panel, click on “Choose” and select “SimpleKMeans”.
 7. Then click the text box beside the Choose button (a pop-up window will appear). Set
     the numClusters value to 4 and click ok.




                                            Fig.6
 8. Make sure that the Use training set is selected in the Cluster mode panel and then
     click Start button to begin clustering process.




                                            Fig.7




Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur                           7
IT for Business Intelligence


    9. In the Result list panel, right click the result and select View in a separate window.
        Following result will be displayed:




                                                   Fig.8

Interpretation of clustering results
Based on the values of cluster centroids as shown in the above figure, we can state the
characteristics of each of the clusters. As an example we will describe the characteristics of
cluster 2 having 264 instances.

Cluster 2 characteristics:
       Prefer to use e-mail rather than letter and credit cards over cash
       Somewhat believe that quality products are priced high
       Don't think much before buying, TV not a major source of entertainment
       Car is considered more of a luxury, somewhat prefer fast foods
       Health conscious, women are active decision makers
       Friendly towards products of foreign companies
       Enjoy movies, prefer banded products and weekend trips


Similarly the salient features of each of the clusters can be obtained from the results and
would subsequently help the FMCG firm to take the decision regarding which segment
(cluster) it should primarily target.




 Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur                                  8
IT for Business Intelligence




Visualization of Clustering Results
A more intuitive way to go through the results is to visualize them in the graphical form. To do
so:
         Right click the result in the Result list panel
         Select Visualize cluster assignments
         By setting X-axis variable as Cluster, Y-axis variable as Instance_number and Color as
          var 11, we get the following output:




                                                   Fig.9
From the above graph we can interpret that people in cluster 0 like watching movies a lot
while people in cluster 3 don’t like watching movies. Cluster 1 and cluster 2 have mixed
responses which are skewed towards watching movies.
Similarly we can change the variables in X-axis, Y-axis and color to visualize other aspects of
result. Note that WEKA has generated an extra variable named “Cluster” (not present in
original data) which signifies the cluster membership of various instances.
We can save the output as an arff file by clicking on the save button in Fig. 9.
The output file contains an additional attribute cluster for each instance.
Thus besides the value of fifteen attributes for any instance, the output also specifies the
cluster membership for that instance.




 Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur                                    9
IT for Business Intelligence



A part of the saved output arff file:
       @relation 'Clustering Data_clustered'

       @attribute     Instance_number numeric
       @attribute     var01 numeric
       @attribute     var02 numeric
       @attribute     var03 numeric
       @attribute     var04 numeric
       @attribute     var05 numeric
       @attribute     var06 numeric
       @attribute     var07 numeric
       @attribute     var08 numeric
       @attribute     var09 numeric
       @attribute     var10 numeric
       @attribute     var11 numeric
       @attribute     var12 numeric
       @attribute     var13 numeric
       @attribute     var14 numeric
       @attribute     var15 numeric
       @attribute     Cluster {cluster0, cluster1, cluster2, cluster3}

       @data
       0,1,3,1,2,3,1,3,2,3,2,2,1,1,1,1,cluster0
       1,2,3,2,3,2,2,3,2,4,1,5,2,2,2,2,cluster3
       2,3,2,3,2,3,1,3,3,2,2,2,3,2,2,3,cluster0
       3,3,2,2,2,2,2,3,2,1,2,1,2,1,1,1,cluster0
       4,2,2,2,2,2,1,3,3,2,2,1,1,3,3,2,cluster0
       5,2,2,3,3,1,2,2,2,3,2,1,2,3,3,3,cluster0
       6,1,1,2,2,2,1,2,2,2,1,2,3,3,3,1,cluster0
       7,2,1,1,2,1,2,1,1,1,1,3,3,1,1,2,cluster0
       8,2,1,1,3,2,2,2,1,2,1,2,2,2,2,3,cluster0
       9,1,2,2,3,2,1,1,1,3,2,1,1,2,2,1,cluster0
       10,2,3,3,2,1,2,1,1,2,2,2,1,1,1,2,cluster0
       11,3,2,2,2,3,2,1,1,1,3,2,2,2,2,3,cluster0
       12,2,3,2,2,3,3,2,2,2,3,2,3,1,1,2,cluster0
       ……..



We can also use other clustering methods to group the data into clusters. WEKA is particularly
useful in the clustering process when the size of data is huge. It can generate clusters pretty
quickly even with huge data. With numerous applications of clustering in business, WEKA can
be very useful in the clustering of data in real business scenarios.




 Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur                                  10
IT for Business Intelligence




Linear Regression using WEKA

Regression
Regression analysis helps us to determine the nature and stre ngth of relationship between
two variables or between one dependent variable and number of independent variables. In
regression analysis, an estimating equation is developed which is a mathematical formula that
relates the known (independent) variables to the unknown (dependent) variable. After this
correlation analysis can be applied to determine the degree to which variables are related.
Broadly, regression can be classified into two types:
      Simple linear regression (one dependent variable and one independent variable)
      Multiple regression (one dependent variable and many independent variables)
I will illustrate the process of Multiple regression in WEKA with an example in this term paper.

Business applications of Regression


      Pricing decisions
      Risk Analysis for investments
      Sales/Market forecasts
      Trend line analysis
      Total quality control
      Development of better hiring plans

Regression in WEKA
Example Problem: Kristal Auto (fictional name) is car manufacturing company that has
presence in all segments ranging from A1 segment hatchbacks to premium saloons and SUVs.
It is planning to launch a crossover. In its pursuit to price the new crossover appropriately and
competitively, Kristal wants to determine what all factors determine the price of a car and up to
which extent each factor influences the price. To do so it collects the data for 2220 different car
models. It considered 10 features that primarily determine the price of a car. Following are
those 10 features:
      Displacement (cc)
      Mileage (kmpl)
      Boot Space (ltrs)




 Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur                                     11
IT for Business Intelligence



      Length (mm)
      Anti- lock Braking System (ABS)
      Electronic Stability Program (ESP)
      Anti-theft alarm
      Airbags
      Keyless Entry
      Global Positioning System (GPS)
Then the company regress these independent variables with the price to develop a regression
model which could assist it in its pricing decisions. I will describe the regression process using
WEKA in this term paper.


We will use the data in csv file named regression data.

Steps to be followed:
   1. Select Explorer from the WEKA GUI user window and load the file regression data as
       described in the clustering example. Following screen will appear after this:




                                              Fig.1 0




 Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur                                    12
IT for Business Intelligence


   2. Click Classifier tab in the explorer window and then click the Choose button in the
       Classifier panel. Then select LinearRegression from functions. Following screen will
       appear:




                                            Fig.1 1
It automatically identifies the dependent variable as Price (as shown below Test Options panel).
In case it doesn’t happen we can select the dependent variable.
   3. Press the Start button. Following output will be generated:




                                            Fig.1 2
Output can also be viewed in a separate window (as described earlier in clustering example).




 Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur                                 13
IT for Business Intelligence



We can see that the output contains the regression equation which can be used to price the
new crossover. (Note that the equation may not be coherent with real life situation as the
data used is manipulated to effectively demonstrate the technique.)
From the regression equation we can see that Boot space and length are negatively correlated
to the price, while displacement, ABS and GPS are positively correlated to the price.
We can also visualize the classifier error i.e. those instances which are wrongly predicted by
regression equation by right clinking on the result set in the Result list panel and selecting
Visualize classifier errors.




                                               Fig.1 3
The X-axis has Price (actual) and the Y-axis has Predicted Price.


Other applications of WEKA in data mining
WEKA can be used in various other data mining techniques. Some of them are:
      Classification (using decision trees)
      Collaborative filtering (Nearest Neighbor)
      Association




 Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur                               14
IT for Business Intelligence




References


   Data Mining by Ian H. Witten, Eibe Frank and Mark A. Hall (3rd edition, Morgan

      Kaufmann publisher)
   www.cs.waikato.ac.nz/ml/weka/




 Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur                     15

Más contenido relacionado

La actualidad más candente

Chapter 4 Classification
Chapter 4 ClassificationChapter 4 Classification
Chapter 4 ClassificationKhalid Elshafie
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMamiteshg
 
Lung Cancer Detection using transfer learning.pptx.pdf
Lung Cancer Detection using transfer learning.pptx.pdfLung Cancer Detection using transfer learning.pptx.pdf
Lung Cancer Detection using transfer learning.pptx.pdfjagan477830
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic conceptsKrish_ver2
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingHoang Nguyen
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniquesVenkata Reddy Konasani
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataSalah Amean
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data MiningKamal Acharya
 
data mining with weka application
data mining with weka applicationdata mining with weka application
data mining with weka applicationRezapourabbas
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised LearningLukas Tencer
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural NetworksYogendra Tamang
 
Classification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision TreesClassification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision Treessathish sak
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Salah Amean
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKAbutest
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluationeShikshak
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introductionDr-Dipali Meher
 

La actualidad más candente (20)

Chapter 4 Classification
Chapter 4 ClassificationChapter 4 Classification
Chapter 4 Classification
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
 
Lung Cancer Detection using transfer learning.pptx.pdf
Lung Cancer Detection using transfer learning.pptx.pdfLung Cancer Detection using transfer learning.pptx.pdf
Lung Cancer Detection using transfer learning.pptx.pdf
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
M.Phil Viva PPT
M.Phil Viva PPTM.Phil Viva PPT
M.Phil Viva PPT
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
XML Encryption
XML EncryptionXML Encryption
XML Encryption
 
GoogLeNet Insights
GoogLeNet InsightsGoogLeNet Insights
GoogLeNet Insights
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
data mining with weka application
data mining with weka applicationdata mining with weka application
data mining with weka application
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Classification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision TreesClassification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision Trees
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKA
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
 

Destacado

WEKA: A Useful Tool for Air Quality Forecasting
WEKA: A Useful Tool for Air Quality ForecastingWEKA: A Useful Tool for Air Quality Forecasting
WEKA: A Useful Tool for Air Quality Forecastingbutest
 
Analytics machine learning in weka
Analytics machine learning in wekaAnalytics machine learning in weka
Analytics machine learning in wekaSudhakar Chavan
 
WebSite Visit Forecasting Using Data Mining Techniques
WebSite Visit Forecasting Using Data Mining  TechniquesWebSite Visit Forecasting Using Data Mining  Techniques
WebSite Visit Forecasting Using Data Mining TechniquesChandana Napagoda
 
Android App Development - 14 location, media and notifications
Android App Development - 14 location, media and notificationsAndroid App Development - 14 location, media and notifications
Android App Development - 14 location, media and notificationsDiego Grancini
 
Kako da prodate svoj proizvod ili uslugu preko interneta
Kako da prodate svoj proizvod ili uslugu preko internetaKako da prodate svoj proizvod ili uslugu preko interneta
Kako da prodate svoj proizvod ili uslugu preko internetaMladen Stojanovic
 
Celebrity Endorsements (consumer perception)
Celebrity Endorsements (consumer perception)Celebrity Endorsements (consumer perception)
Celebrity Endorsements (consumer perception)Rosh Rajendran
 
Ppt on impact of celebrity endorsement on consumer buying decision
Ppt on impact of celebrity endorsement on consumer buying decisionPpt on impact of celebrity endorsement on consumer buying decision
Ppt on impact of celebrity endorsement on consumer buying decisionmolisca
 
Celebrity endorsements finally ppt
Celebrity endorsements finally pptCelebrity endorsements finally ppt
Celebrity endorsements finally pptMariyam Khan Baloch
 
CELEBRITY ENDORSEMENT
CELEBRITY ENDORSEMENTCELEBRITY ENDORSEMENT
CELEBRITY ENDORSEMENTKHYATI89
 
"A Project Report on The Impact of Celebrity Endorsement on Brand Personality...
"A Project Report on The Impact of Celebrity Endorsement on Brand Personality..."A Project Report on The Impact of Celebrity Endorsement on Brand Personality...
"A Project Report on The Impact of Celebrity Endorsement on Brand Personality...Raj Ray
 
La pobreza en méxico
La pobreza en méxicoLa pobreza en méxico
La pobreza en méxicoDaan Romeero
 

Destacado (20)

WEKA: A Useful Tool for Air Quality Forecasting
WEKA: A Useful Tool for Air Quality ForecastingWEKA: A Useful Tool for Air Quality Forecasting
WEKA: A Useful Tool for Air Quality Forecasting
 
Analytics machine learning in weka
Analytics machine learning in wekaAnalytics machine learning in weka
Analytics machine learning in weka
 
WebSite Visit Forecasting Using Data Mining Techniques
WebSite Visit Forecasting Using Data Mining  TechniquesWebSite Visit Forecasting Using Data Mining  Techniques
WebSite Visit Forecasting Using Data Mining Techniques
 
Web and text
Web and textWeb and text
Web and text
 
Blue edge ppt
Blue edge pptBlue edge ppt
Blue edge ppt
 
cv1aditya
cv1adityacv1aditya
cv1aditya
 
вода
водавода
вода
 
презентация эгхб викторина
презентация эгхб викторинапрезентация эгхб викторина
презентация эгхб викторина
 
150504 platica ´Gerencia de Proyectos UANL
150504 platica ´Gerencia de Proyectos UANL150504 platica ´Gerencia de Proyectos UANL
150504 platica ´Gerencia de Proyectos UANL
 
вода
водавода
вода
 
Android App Development - 14 location, media and notifications
Android App Development - 14 location, media and notificationsAndroid App Development - 14 location, media and notifications
Android App Development - 14 location, media and notifications
 
Kako da prodate svoj proizvod ili uslugu preko interneta
Kako da prodate svoj proizvod ili uslugu preko internetaKako da prodate svoj proizvod ili uslugu preko interneta
Kako da prodate svoj proizvod ili uslugu preko interneta
 
Data mining with weka
Data mining with wekaData mining with weka
Data mining with weka
 
Celebrity Endorsements (consumer perception)
Celebrity Endorsements (consumer perception)Celebrity Endorsements (consumer perception)
Celebrity Endorsements (consumer perception)
 
Ppt on impact of celebrity endorsement on consumer buying decision
Ppt on impact of celebrity endorsement on consumer buying decisionPpt on impact of celebrity endorsement on consumer buying decision
Ppt on impact of celebrity endorsement on consumer buying decision
 
Celebrity endorsements finally ppt
Celebrity endorsements finally pptCelebrity endorsements finally ppt
Celebrity endorsements finally ppt
 
CELEBRITY ENDORSEMENT
CELEBRITY ENDORSEMENTCELEBRITY ENDORSEMENT
CELEBRITY ENDORSEMENT
 
"A Project Report on The Impact of Celebrity Endorsement on Brand Personality...
"A Project Report on The Impact of Celebrity Endorsement on Brand Personality..."A Project Report on The Impact of Celebrity Endorsement on Brand Personality...
"A Project Report on The Impact of Celebrity Endorsement on Brand Personality...
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
La pobreza en méxico
La pobreza en méxicoLa pobreza en méxico
La pobreza en méxico
 

Similar a Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)

Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_SagarSagar Kumar
 
DATA MINING on WEKA
DATA MINING on WEKADATA MINING on WEKA
DATA MINING on WEKAsatyamkhatri
 
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Saurabh Singh
 
IRJET- Analysis of Various Machine Learning Algorithms for Stock Value Predic...
IRJET- Analysis of Various Machine Learning Algorithms for Stock Value Predic...IRJET- Analysis of Various Machine Learning Algorithms for Stock Value Predic...
IRJET- Analysis of Various Machine Learning Algorithms for Stock Value Predic...IRJET Journal
 
IRJET- Automated CV Classification using Clustering Technique
IRJET- Automated CV Classification using Clustering TechniqueIRJET- Automated CV Classification using Clustering Technique
IRJET- Automated CV Classification using Clustering TechniqueIRJET Journal
 
IRJET- Sentimental Analysis of Product Reviews for E-Commerce Websites
IRJET- Sentimental Analysis of Product Reviews for E-Commerce WebsitesIRJET- Sentimental Analysis of Product Reviews for E-Commerce Websites
IRJET- Sentimental Analysis of Product Reviews for E-Commerce WebsitesIRJET Journal
 
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...
IRJET -  	  Recommendations Engine with Multi-Objective Contextual Bandits (U...IRJET -  	  Recommendations Engine with Multi-Objective Contextual Bandits (U...
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...IRJET Journal
 
STOCK MARKET ANALYZING AND PREDICTION USING MACHINE LEARNING TECHNIQUES
STOCK MARKET ANALYZING AND PREDICTION USING MACHINE LEARNING TECHNIQUESSTOCK MARKET ANALYZING AND PREDICTION USING MACHINE LEARNING TECHNIQUES
STOCK MARKET ANALYZING AND PREDICTION USING MACHINE LEARNING TECHNIQUESIRJET Journal
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET Journal
 
IRJET- Financial Analysis using Data Mining
IRJET- Financial Analysis using Data MiningIRJET- Financial Analysis using Data Mining
IRJET- Financial Analysis using Data MiningIRJET Journal
 
IRJET- Opinion Mining and Sentiment Analysis for Online Review
IRJET-  	  Opinion Mining and Sentiment Analysis for Online ReviewIRJET-  	  Opinion Mining and Sentiment Analysis for Online Review
IRJET- Opinion Mining and Sentiment Analysis for Online ReviewIRJET Journal
 
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET Journal
 
Smart Health Guide App
Smart Health Guide AppSmart Health Guide App
Smart Health Guide AppIRJET Journal
 
Data Severance Using Machine Learning for Marketing Strategies
Data Severance Using Machine Learning for Marketing StrategiesData Severance Using Machine Learning for Marketing Strategies
Data Severance Using Machine Learning for Marketing StrategiesIRJET Journal
 
Rachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra
 
Bank Customer Segmentation & Insurance Claim Prediction
Bank Customer Segmentation & Insurance Claim PredictionBank Customer Segmentation & Insurance Claim Prediction
Bank Customer Segmentation & Insurance Claim PredictionIRJET Journal
 
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...IRJET Journal
 
A Machine learning based framework for Verification and Validation of Massive...
A Machine learning based framework for Verification and Validation of Massive...A Machine learning based framework for Verification and Validation of Massive...
A Machine learning based framework for Verification and Validation of Massive...IRJET Journal
 
Business Analysis using Machine Learning
Business Analysis using Machine LearningBusiness Analysis using Machine Learning
Business Analysis using Machine LearningIRJET Journal
 

Similar a Data Mining Techniques using WEKA (Ankit Pandey-10BM60012) (20)

Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_Sagar
 
DATA MINING on WEKA
DATA MINING on WEKADATA MINING on WEKA
DATA MINING on WEKA
 
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
 
IRJET- Analysis of Various Machine Learning Algorithms for Stock Value Predic...
IRJET- Analysis of Various Machine Learning Algorithms for Stock Value Predic...IRJET- Analysis of Various Machine Learning Algorithms for Stock Value Predic...
IRJET- Analysis of Various Machine Learning Algorithms for Stock Value Predic...
 
IRJET- Automated CV Classification using Clustering Technique
IRJET- Automated CV Classification using Clustering TechniqueIRJET- Automated CV Classification using Clustering Technique
IRJET- Automated CV Classification using Clustering Technique
 
IRJET- Sentimental Analysis of Product Reviews for E-Commerce Websites
IRJET- Sentimental Analysis of Product Reviews for E-Commerce WebsitesIRJET- Sentimental Analysis of Product Reviews for E-Commerce Websites
IRJET- Sentimental Analysis of Product Reviews for E-Commerce Websites
 
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...
IRJET -  	  Recommendations Engine with Multi-Objective Contextual Bandits (U...IRJET -  	  Recommendations Engine with Multi-Objective Contextual Bandits (U...
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...
 
STOCK MARKET ANALYZING AND PREDICTION USING MACHINE LEARNING TECHNIQUES
STOCK MARKET ANALYZING AND PREDICTION USING MACHINE LEARNING TECHNIQUESSTOCK MARKET ANALYZING AND PREDICTION USING MACHINE LEARNING TECHNIQUES
STOCK MARKET ANALYZING AND PREDICTION USING MACHINE LEARNING TECHNIQUES
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
 
IRJET- Financial Analysis using Data Mining
IRJET- Financial Analysis using Data MiningIRJET- Financial Analysis using Data Mining
IRJET- Financial Analysis using Data Mining
 
IRJET- Opinion Mining and Sentiment Analysis for Online Review
IRJET-  	  Opinion Mining and Sentiment Analysis for Online ReviewIRJET-  	  Opinion Mining and Sentiment Analysis for Online Review
IRJET- Opinion Mining and Sentiment Analysis for Online Review
 
Automate 2
Automate 2Automate 2
Automate 2
 
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine Learning
 
Smart Health Guide App
Smart Health Guide AppSmart Health Guide App
Smart Health Guide App
 
Data Severance Using Machine Learning for Marketing Strategies
Data Severance Using Machine Learning for Marketing StrategiesData Severance Using Machine Learning for Marketing Strategies
Data Severance Using Machine Learning for Marketing Strategies
 
Rachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_report
 
Bank Customer Segmentation & Insurance Claim Prediction
Bank Customer Segmentation & Insurance Claim PredictionBank Customer Segmentation & Insurance Claim Prediction
Bank Customer Segmentation & Insurance Claim Prediction
 
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
 
A Machine learning based framework for Verification and Validation of Massive...
A Machine learning based framework for Verification and Validation of Massive...A Machine learning based framework for Verification and Validation of Massive...
A Machine learning based framework for Verification and Validation of Massive...
 
Business Analysis using Machine Learning
Business Analysis using Machine LearningBusiness Analysis using Machine Learning
Business Analysis using Machine Learning
 

Último

Cannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannaBusinessPlans
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityEric T. Tung
 
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...Falcon Invoice Discounting
 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGpr788182
 
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...ssuserf63bd7
 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGpr788182
 
Pre Engineered Building Manufacturers Hyderabad.pptx
Pre Engineered  Building Manufacturers Hyderabad.pptxPre Engineered  Building Manufacturers Hyderabad.pptx
Pre Engineered Building Manufacturers Hyderabad.pptxRoofing Contractor
 
joint cost.pptx COST ACCOUNTING Sixteenth Edition ...
joint cost.pptx  COST ACCOUNTING  Sixteenth Edition                          ...joint cost.pptx  COST ACCOUNTING  Sixteenth Edition                          ...
joint cost.pptx COST ACCOUNTING Sixteenth Edition ...NadhimTaha
 
Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon investment
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...daisycvs
 
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistanvineshkumarsajnani12
 
Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1kcpayne
 
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptxQSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptxDitasDelaCruz
 
Chennai Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Av...
Chennai Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Av...Chennai Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Av...
Chennai Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Av...pujan9679
 
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service AvailableNashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Availablepr788182
 
GUWAHATI 💋 Call Girl 9827461493 Call Girls in Escort service book now
GUWAHATI 💋 Call Girl 9827461493 Call Girls in  Escort service book nowGUWAHATI 💋 Call Girl 9827461493 Call Girls in  Escort service book now
GUWAHATI 💋 Call Girl 9827461493 Call Girls in Escort service book nowkapoorjyoti4444
 
Kalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book nowKalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book nowranineha57744
 
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAIGetting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAITim Wilson
 
Putting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptxPutting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptxCynthia Clay
 

Último (20)

Cannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 Updated
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
Pre Engineered Building Manufacturers Hyderabad.pptx
Pre Engineered  Building Manufacturers Hyderabad.pptxPre Engineered  Building Manufacturers Hyderabad.pptx
Pre Engineered Building Manufacturers Hyderabad.pptx
 
joint cost.pptx COST ACCOUNTING Sixteenth Edition ...
joint cost.pptx  COST ACCOUNTING  Sixteenth Edition                          ...joint cost.pptx  COST ACCOUNTING  Sixteenth Edition                          ...
joint cost.pptx COST ACCOUNTING Sixteenth Edition ...
 
Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business Potential
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
 
Buy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail AccountsBuy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail Accounts
 
Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1
 
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptxQSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
 
Chennai Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Av...
Chennai Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Av...Chennai Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Av...
Chennai Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Av...
 
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service AvailableNashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
 
GUWAHATI 💋 Call Girl 9827461493 Call Girls in Escort service book now
GUWAHATI 💋 Call Girl 9827461493 Call Girls in  Escort service book nowGUWAHATI 💋 Call Girl 9827461493 Call Girls in  Escort service book now
GUWAHATI 💋 Call Girl 9827461493 Call Girls in Escort service book now
 
Kalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book nowKalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book now
 
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAIGetting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
 
Putting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptxPutting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptx
 

Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)

  • 1. Data Mining Techniques using WEKA IT for Business Intelligence Ankit Pandey (10BM60012) This term paper contains a brief introduction of a powerful data mining tool WEKA along with a hands- on guide to two data mining techniques namely Clustering (k-means) and Linear Regression using WEKA.
  • 2. IT for Business Intelligence Data Mining Techniques using WEKA Introduction to WEKA WEKA (Waikato Environment for Knowledge Analysis) is a collection of state-of-the-art machine learning algorithms and data preprocessing tools written in Java, developed at the University of Waikato, New Zealand. It is free software that runs on almost any platform and is available under the GNU General Public License. It has a wide range of applications in various data mining techniques. It provides extensive support for the entire process of experimental data mining, including preparing the input data, evaluating learning schemes statistically, and visualizing the input data and the result of learning. The WEKA workbench includes methods for the main data mining problems: regression, classification, clustering, association rule mining, and attribute selection. It can be used in either of the following two interfaces –  Command Line Interface (CLI)  Graphical User Interface (GUI) The WEKA GUI Chooser appears like this – Fig.1 The buttons can be used to start the following applications –  Explorer – Environment for exploring data with WEKA. It gives access to all the facilities using menu selection and form filling.  Experimenter – It can be used to get the answer for a question: Which methods and parameter values work best for the given problem? Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur 2
  • 3. IT for Business Intelligence  KnowledgeFlow – Same function as explorer. Supports incremental learning. It allows designing configurations for streamed data processing. Incremental algorithms can be used to process very large datasets.  Simple CLI – It provides a simple Command Line Interface for directly executing WEKA commands. This term paper will demonstrate the following two data mining techniques using WEKA:  Clustering (Simple K Means)  Linear regression Clustering using WEKA Clustering Clustering is a class of techniques used to classify objects or cases into relatively homogenous groups called clusters. Objects in each cluster tend to be similar to each other and dissimilar to objects in the other clusters. In clustering, there is no a-priori information about the group or cluster membership for any of the objects. There are two major types of clustering techniques viz.  Hierarchical Clustering  Non-Hierarchical Clustering (aka k-means Clustering) HIERARCHICAL CLUSTERING - Some measure of distance (usually Euclidean or squared Euclidean) is used to identify distances between all pairs of objects to be clustered. We begin with all objects in separate clusters. Two closest objects are joined to form a cluster. This process continues, until points join existing clusters (because they are closest to an existing cluster), and clusters join other clusters, based on the shortest distance criterion. NON-HIERARCHICAL (K-MEANS ) CL USTERING - We need to rationally specify the number of clusters we want the objects to be clustered into. In this term paper, i will illustrate the process of k-means clustering through WEKA. Business applications of Clustering  Segmentation of the market  Understanding buyer behavior  Identifying new product opportunities  Selecting test markets and Reducing data Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur 3
  • 4. IT for Business Intelligence K-means clustering in WEKA Example Proble m: A major Indian FMCG company wants to map the profile of its target market in terms of lifestyle, attitudes and perceptions. The company's managers prepare, with the help of their marketing research team, a set o f 15 statements, which they feel measure many of the variables of interest. These 15 statements are given below. The respondent had to agree or disagree (1 = Strongly Agree, 2 = Agree, 3 = Neither Agree nor Disagree, 4 = Disagree, 5 = Strongly Disagree) with each statement. 1. I prefer to use e- mail rather than write a letter. 2. I feel that quality products are always priced high. 3. I think twice before I buy anything. 4. Television is a major source of entertainment. 5. A car is a necessity rather than a luxury. 6. I prefer fast food and ready to use products. 7. People are more health conscious today. 8. Entry of foreign companies has increased the efficiency of Indian companies. 9. Women are active participants in purchase decisions. 10. I believe politicians can play a positive role. 11. I enjoy watching movies. 12. If I get a chance, I would like to settle abroad. 13. I always buy branded products. 14. I frequently go out on weekends. 15. I prefer to pay by credit card rather than in cash. The company wants to cluster the market based on the above attributes to facilitate itself in effectively catering to most feasible and lucrative segment. I will describe how WEKA can be used to do this. For the purpose of simplification we renamed the above 15 statements as variables in csv file clustering data i.e. “var01” through “var15”. This data file contains 1436 instances (responses). As an input WEKA accepts few file formats including arff and csv. In this case we are using a csv file as an input. Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur 4
  • 5. IT for Business Intelligence Steps to be followed: 1. Select Explorer in the Weka GUI Chooser window (displayed previously). 2. The following window will appear – Fig.2 3. Select “Open File” and load the csv file clustering data. After loading the file, the interface will be like this – Fig.3 Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur 5
  • 6. IT for Business Intelligence 4. We can click on “Visualize All” to view the distribution of all variables in the sample population as follows: Fig.4 The preprocessing tasks in WEKA obviate the need to convert the data set into the standard spreadsheet format and convert categorical attributes to binary. The WEKA SimpleKMeans algorithm uses Euclidean distance measure to compute distances between instances and clusters. 5. For performing clustering operation, select the tab “Cluster” in the explorer window. Fig.5 [All the figures from Fig. 4 onwards can be viewed more clearly in a separate window by clicking over them] Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur 6
  • 7. IT for Business Intelligence 6. In the Clusterer panel, click on “Choose” and select “SimpleKMeans”. 7. Then click the text box beside the Choose button (a pop-up window will appear). Set the numClusters value to 4 and click ok. Fig.6 8. Make sure that the Use training set is selected in the Cluster mode panel and then click Start button to begin clustering process. Fig.7 Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur 7
  • 8. IT for Business Intelligence 9. In the Result list panel, right click the result and select View in a separate window. Following result will be displayed: Fig.8 Interpretation of clustering results Based on the values of cluster centroids as shown in the above figure, we can state the characteristics of each of the clusters. As an example we will describe the characteristics of cluster 2 having 264 instances. Cluster 2 characteristics:  Prefer to use e-mail rather than letter and credit cards over cash  Somewhat believe that quality products are priced high  Don't think much before buying, TV not a major source of entertainment  Car is considered more of a luxury, somewhat prefer fast foods  Health conscious, women are active decision makers  Friendly towards products of foreign companies  Enjoy movies, prefer banded products and weekend trips Similarly the salient features of each of the clusters can be obtained from the results and would subsequently help the FMCG firm to take the decision regarding which segment (cluster) it should primarily target. Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur 8
  • 9. IT for Business Intelligence Visualization of Clustering Results A more intuitive way to go through the results is to visualize them in the graphical form. To do so:  Right click the result in the Result list panel  Select Visualize cluster assignments  By setting X-axis variable as Cluster, Y-axis variable as Instance_number and Color as var 11, we get the following output: Fig.9 From the above graph we can interpret that people in cluster 0 like watching movies a lot while people in cluster 3 don’t like watching movies. Cluster 1 and cluster 2 have mixed responses which are skewed towards watching movies. Similarly we can change the variables in X-axis, Y-axis and color to visualize other aspects of result. Note that WEKA has generated an extra variable named “Cluster” (not present in original data) which signifies the cluster membership of various instances. We can save the output as an arff file by clicking on the save button in Fig. 9. The output file contains an additional attribute cluster for each instance. Thus besides the value of fifteen attributes for any instance, the output also specifies the cluster membership for that instance. Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur 9
  • 10. IT for Business Intelligence A part of the saved output arff file: @relation 'Clustering Data_clustered' @attribute Instance_number numeric @attribute var01 numeric @attribute var02 numeric @attribute var03 numeric @attribute var04 numeric @attribute var05 numeric @attribute var06 numeric @attribute var07 numeric @attribute var08 numeric @attribute var09 numeric @attribute var10 numeric @attribute var11 numeric @attribute var12 numeric @attribute var13 numeric @attribute var14 numeric @attribute var15 numeric @attribute Cluster {cluster0, cluster1, cluster2, cluster3} @data 0,1,3,1,2,3,1,3,2,3,2,2,1,1,1,1,cluster0 1,2,3,2,3,2,2,3,2,4,1,5,2,2,2,2,cluster3 2,3,2,3,2,3,1,3,3,2,2,2,3,2,2,3,cluster0 3,3,2,2,2,2,2,3,2,1,2,1,2,1,1,1,cluster0 4,2,2,2,2,2,1,3,3,2,2,1,1,3,3,2,cluster0 5,2,2,3,3,1,2,2,2,3,2,1,2,3,3,3,cluster0 6,1,1,2,2,2,1,2,2,2,1,2,3,3,3,1,cluster0 7,2,1,1,2,1,2,1,1,1,1,3,3,1,1,2,cluster0 8,2,1,1,3,2,2,2,1,2,1,2,2,2,2,3,cluster0 9,1,2,2,3,2,1,1,1,3,2,1,1,2,2,1,cluster0 10,2,3,3,2,1,2,1,1,2,2,2,1,1,1,2,cluster0 11,3,2,2,2,3,2,1,1,1,3,2,2,2,2,3,cluster0 12,2,3,2,2,3,3,2,2,2,3,2,3,1,1,2,cluster0 …….. We can also use other clustering methods to group the data into clusters. WEKA is particularly useful in the clustering process when the size of data is huge. It can generate clusters pretty quickly even with huge data. With numerous applications of clustering in business, WEKA can be very useful in the clustering of data in real business scenarios. Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur 10
  • 11. IT for Business Intelligence Linear Regression using WEKA Regression Regression analysis helps us to determine the nature and stre ngth of relationship between two variables or between one dependent variable and number of independent variables. In regression analysis, an estimating equation is developed which is a mathematical formula that relates the known (independent) variables to the unknown (dependent) variable. After this correlation analysis can be applied to determine the degree to which variables are related. Broadly, regression can be classified into two types:  Simple linear regression (one dependent variable and one independent variable)  Multiple regression (one dependent variable and many independent variables) I will illustrate the process of Multiple regression in WEKA with an example in this term paper. Business applications of Regression  Pricing decisions  Risk Analysis for investments  Sales/Market forecasts  Trend line analysis  Total quality control  Development of better hiring plans Regression in WEKA Example Problem: Kristal Auto (fictional name) is car manufacturing company that has presence in all segments ranging from A1 segment hatchbacks to premium saloons and SUVs. It is planning to launch a crossover. In its pursuit to price the new crossover appropriately and competitively, Kristal wants to determine what all factors determine the price of a car and up to which extent each factor influences the price. To do so it collects the data for 2220 different car models. It considered 10 features that primarily determine the price of a car. Following are those 10 features:  Displacement (cc)  Mileage (kmpl)  Boot Space (ltrs) Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur 11
  • 12. IT for Business Intelligence  Length (mm)  Anti- lock Braking System (ABS)  Electronic Stability Program (ESP)  Anti-theft alarm  Airbags  Keyless Entry  Global Positioning System (GPS) Then the company regress these independent variables with the price to develop a regression model which could assist it in its pricing decisions. I will describe the regression process using WEKA in this term paper. We will use the data in csv file named regression data. Steps to be followed: 1. Select Explorer from the WEKA GUI user window and load the file regression data as described in the clustering example. Following screen will appear after this: Fig.1 0 Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur 12
  • 13. IT for Business Intelligence 2. Click Classifier tab in the explorer window and then click the Choose button in the Classifier panel. Then select LinearRegression from functions. Following screen will appear: Fig.1 1 It automatically identifies the dependent variable as Price (as shown below Test Options panel). In case it doesn’t happen we can select the dependent variable. 3. Press the Start button. Following output will be generated: Fig.1 2 Output can also be viewed in a separate window (as described earlier in clustering example). Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur 13
  • 14. IT for Business Intelligence We can see that the output contains the regression equation which can be used to price the new crossover. (Note that the equation may not be coherent with real life situation as the data used is manipulated to effectively demonstrate the technique.) From the regression equation we can see that Boot space and length are negatively correlated to the price, while displacement, ABS and GPS are positively correlated to the price. We can also visualize the classifier error i.e. those instances which are wrongly predicted by regression equation by right clinking on the result set in the Result list panel and selecting Visualize classifier errors. Fig.1 3 The X-axis has Price (actual) and the Y-axis has Predicted Price. Other applications of WEKA in data mining WEKA can be used in various other data mining techniques. Some of them are:  Classification (using decision trees)  Collaborative filtering (Nearest Neighbor)  Association Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur 14
  • 15. IT for Business Intelligence References  Data Mining by Ian H. Witten, Eibe Frank and Mark A. Hall (3rd edition, Morgan Kaufmann publisher)  www.cs.waikato.ac.nz/ml/weka/ Ankit Pandey (10BM60012), MBA 2nd Year VGSoM IIT Kharagpur 15