SlideShare una empresa de Scribd logo
1 de 10
IT for Business Intelligence




Data Mining Techniques Classification and
        regression Using WEKA



                         A.Kranthikumar (10BM60001)
Classification via decision trees using WEKA
Problem:
A bank is introducing a new financial product. So the bank wants to classify the new
customers whether they will be ready to buy the new product or not. Bank has the
existing information from the old clients who are interested in buying the new
product.

Classification is a statistical technique that helps to classify any new client into one of
the existing groups. It will create a model on the test data available. And then
classifies the new data based on the model that is developed using the test data.

Steps to do classification in WEKA
Step 1: Create a data file in the format of arff or csv. Weka understands these two
formats. We are taking the file in csv format Bank.csv

Step 2: Open the Weka application. This will show the following screen




Now click on the Explorer tab. This directs to the following window.
Step 3: Loading data into WEKA.

To do that click on the open file button and browse for the bank.csv file. Then it
shows all the attributes as shown in the below figure.
Step 4: View the data

      In the selected attribute panel you can see the values corresponding to the
      attributes and also its type, name e.t.c
      You can also visualize the frequency distribution of all the attributes at a time
      by clicking on the “Visualize All” button. It shows the following screen.




This visualizes all shows the range of data for each attribute and also the mean,
median and frequency of each attribute. For example the value of age in our case is
ranging from 18 to 67 with an average of 42.5

Step 5: Classify the Test data

             To do this select the classify button which shows the following screen.
Then select the J48 algorithm which is under the node of tree when
you click on the choose button. This will show the following screen.
Step 6: Run the classification Algorithm

             Select the dependent variable that should be classified and click on the
             start.
             This shows the output in the classifier output panel in ASCII version of
             the tree.
             This is difficult to understand. To view the output in the form of tree,
             right click on the trees.j48 and select “visualize tree” option. This shows
             the following screen by again right clicking on the output and selecting
             full screen option.




Step 7: Analyze the model created by existing data

      From the Classifier output we can find that the Classification accuracy of the
      model is 89%.
      This means that the model is able to predict the values 89% correctly. So if
      we use the same model to find out the buying decision of new customer the
      probability will be 0.89

Step 8: Test the New customer data

      Create your new customer data in arff or csv format with the same attributes
      as test data.
      Now input the data by checking the radio button “Supplied test set” and click
      on “ set” to browse for the new data set.
Then click on the start button which generates a new tree.
Save the classification result as arff. This file contains a copy of the new
instances along with an additional column for the predicted value. The result
will look like following.
Regression Using WEKA
Problem: The idea is to find out how the CPU performance is correlated with the
attributes like machine cycle time, minimum main memory, cache memory e.t.c

A regression is a statistic tool that helps in finding out how the dependent variable
(CPU performance) is related to the independent attributes.

Steps to do Regression in WEKA
Step 1: Create data file and open the WEKA as in the same way as we did for
Classification.

Step 2: Load the regression data file CPU.arff into weka.

       Click on open file and browse for the file, that shows the following screen




Step 3: Run the regression

       Click on the Classify tab and choose “Linear Regression” from the node under
       function. This shows the following screen.
Click on start that will show output in the classifier output screen which gives a
regression equation.
Interpretation of the output:
   From the output you can see that the CPU performance is more dependent on
   CHMAX and then CACHE
   High correlation coefficient of 0.912 from output suggests that the dependent
   variable is strongly associated with the independent variables.
   We can also determine the new CPU performance by using the regression
   equation if we have the values of the attributes.

Más contenido relacionado

La actualidad más candente

La actualidad más candente (17)

BI-Validator Usecase - Stress Test Plan
BI-Validator Usecase - Stress Test PlanBI-Validator Usecase - Stress Test Plan
BI-Validator Usecase - Stress Test Plan
 
XL-MINER:Partition
XL-MINER:PartitionXL-MINER:Partition
XL-MINER:Partition
 
What if analysis-goal_seek
What if analysis-goal_seekWhat if analysis-goal_seek
What if analysis-goal_seek
 
Data analysis scenarios
Data analysis scenariosData analysis scenarios
Data analysis scenarios
 
XL Miner: Classification
XL Miner: ClassificationXL Miner: Classification
XL Miner: Classification
 
ETL Validator Usecase - Validating Measures, Counts with Variance
ETL Validator Usecase - Validating Measures, Counts with VarianceETL Validator Usecase - Validating Measures, Counts with Variance
ETL Validator Usecase - Validating Measures, Counts with Variance
 
ETL Validator Usecase -Metadata Comparison
ETL Validator Usecase -Metadata ComparisonETL Validator Usecase -Metadata Comparison
ETL Validator Usecase -Metadata Comparison
 
ETL Validator Usecase - checking for LoV conformance
ETL Validator Usecase - checking for LoV conformanceETL Validator Usecase - checking for LoV conformance
ETL Validator Usecase - checking for LoV conformance
 
ETL Validator Usecase - Data Profiling and Comparison
ETL Validator Usecase - Data Profiling and ComparisonETL Validator Usecase - Data Profiling and Comparison
ETL Validator Usecase - Data Profiling and Comparison
 
ETL Validator Usecase - Transformation logic in input data source
ETL Validator Usecase - Transformation logic in input data sourceETL Validator Usecase - Transformation logic in input data source
ETL Validator Usecase - Transformation logic in input data source
 
Excel Datamining Addin Intermediate
Excel Datamining Addin IntermediateExcel Datamining Addin Intermediate
Excel Datamining Addin Intermediate
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
XL-MINER:Prediction
XL-MINER:PredictionXL-MINER:Prediction
XL-MINER:Prediction
 
ETL Validator Usecase - Check for Mandatory Fields
ETL Validator Usecase - Check for Mandatory FieldsETL Validator Usecase - Check for Mandatory Fields
ETL Validator Usecase - Check for Mandatory Fields
 
List
ListList
List
 
ETL Validator Usecase - Testing Transformations or Derived fields
ETL Validator Usecase - Testing Transformations or Derived fieldsETL Validator Usecase - Testing Transformations or Derived fields
ETL Validator Usecase - Testing Transformations or Derived fields
 
Excel chapter-8
Excel chapter-8Excel chapter-8
Excel chapter-8
 

Destacado (9)

Weka
WekaWeka
Weka
 
Clustering and Regression using WEKA
Clustering and Regression using WEKAClustering and Regression using WEKA
Clustering and Regression using WEKA
 
Linear Regression Parameters
Linear Regression ParametersLinear Regression Parameters
Linear Regression Parameters
 
Baidu
BaiduBaidu
Baidu
 
Joints..
Joints..Joints..
Joints..
 
23 joints
23 joints23 joints
23 joints
 
PPT ON WOOD JOINTS AND CARPENTRY TOOLS
PPT ON WOOD JOINTS AND CARPENTRY TOOLSPPT ON WOOD JOINTS AND CARPENTRY TOOLS
PPT ON WOOD JOINTS AND CARPENTRY TOOLS
 
Carpentry
CarpentryCarpentry
Carpentry
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

Similar a Itb weka

Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
rathorenitin87
 
Create a basic performance point dashboard epc
Create a basic performance point dashboard   epcCreate a basic performance point dashboard   epc
Create a basic performance point dashboard epc
EPC Group
 
d5)Go to the following website by clicking on the provided link,
d5)Go to the following website by clicking on the provided link,d5)Go to the following website by clicking on the provided link,
d5)Go to the following website by clicking on the provided link,
OllieShoresna
 

Similar a Itb weka (20)

Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011
 
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
Remedy Presentation
Remedy PresentationRemedy Presentation
Remedy Presentation
 
Less06 2 e_testermodule_5
Less06 2 e_testermodule_5Less06 2 e_testermodule_5
Less06 2 e_testermodule_5
 
Spss basics tutorial
Spss basics tutorialSpss basics tutorial
Spss basics tutorial
 
EA261_2015_Exercises
EA261_2015_ExercisesEA261_2015_Exercises
EA261_2015_Exercises
 
Empowerment Technology Lesson 4
Empowerment Technology Lesson 4Empowerment Technology Lesson 4
Empowerment Technology Lesson 4
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
Oracle business rules
Oracle business rulesOracle business rules
Oracle business rules
 
What Are the Key Steps in Scraping Product Data from Amazon India.pptx
What Are the Key Steps in Scraping Product Data from Amazon India.pptxWhat Are the Key Steps in Scraping Product Data from Amazon India.pptx
What Are the Key Steps in Scraping Product Data from Amazon India.pptx
 
What Are the Key Steps in Scraping Product Data from Amazon India.pdf
What Are the Key Steps in Scraping Product Data from Amazon India.pdfWhat Are the Key Steps in Scraping Product Data from Amazon India.pdf
What Are the Key Steps in Scraping Product Data from Amazon India.pdf
 
(Manual spss)
(Manual spss)(Manual spss)
(Manual spss)
 
Create a basic performance point dashboard epc
Create a basic performance point dashboard   epcCreate a basic performance point dashboard   epc
Create a basic performance point dashboard epc
 
OLT open script
OLT open script OLT open script
OLT open script
 
Hpalm
HpalmHpalm
Hpalm
 
d5)Go to the following website by clicking on the provided link,
d5)Go to the following website by clicking on the provided link,d5)Go to the following website by clicking on the provided link,
d5)Go to the following website by clicking on the provided link,
 
Mca 504 dotnet_unit5
Mca 504 dotnet_unit5Mca 504 dotnet_unit5
Mca 504 dotnet_unit5
 
Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)
 

Último

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Último (20)

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 

Itb weka

  • 1. IT for Business Intelligence Data Mining Techniques Classification and regression Using WEKA A.Kranthikumar (10BM60001)
  • 2. Classification via decision trees using WEKA Problem: A bank is introducing a new financial product. So the bank wants to classify the new customers whether they will be ready to buy the new product or not. Bank has the existing information from the old clients who are interested in buying the new product. Classification is a statistical technique that helps to classify any new client into one of the existing groups. It will create a model on the test data available. And then classifies the new data based on the model that is developed using the test data. Steps to do classification in WEKA Step 1: Create a data file in the format of arff or csv. Weka understands these two formats. We are taking the file in csv format Bank.csv Step 2: Open the Weka application. This will show the following screen Now click on the Explorer tab. This directs to the following window.
  • 3. Step 3: Loading data into WEKA. To do that click on the open file button and browse for the bank.csv file. Then it shows all the attributes as shown in the below figure.
  • 4. Step 4: View the data In the selected attribute panel you can see the values corresponding to the attributes and also its type, name e.t.c You can also visualize the frequency distribution of all the attributes at a time by clicking on the “Visualize All” button. It shows the following screen. This visualizes all shows the range of data for each attribute and also the mean, median and frequency of each attribute. For example the value of age in our case is ranging from 18 to 67 with an average of 42.5 Step 5: Classify the Test data To do this select the classify button which shows the following screen.
  • 5. Then select the J48 algorithm which is under the node of tree when you click on the choose button. This will show the following screen.
  • 6. Step 6: Run the classification Algorithm Select the dependent variable that should be classified and click on the start. This shows the output in the classifier output panel in ASCII version of the tree. This is difficult to understand. To view the output in the form of tree, right click on the trees.j48 and select “visualize tree” option. This shows the following screen by again right clicking on the output and selecting full screen option. Step 7: Analyze the model created by existing data From the Classifier output we can find that the Classification accuracy of the model is 89%. This means that the model is able to predict the values 89% correctly. So if we use the same model to find out the buying decision of new customer the probability will be 0.89 Step 8: Test the New customer data Create your new customer data in arff or csv format with the same attributes as test data. Now input the data by checking the radio button “Supplied test set” and click on “ set” to browse for the new data set.
  • 7. Then click on the start button which generates a new tree. Save the classification result as arff. This file contains a copy of the new instances along with an additional column for the predicted value. The result will look like following.
  • 8. Regression Using WEKA Problem: The idea is to find out how the CPU performance is correlated with the attributes like machine cycle time, minimum main memory, cache memory e.t.c A regression is a statistic tool that helps in finding out how the dependent variable (CPU performance) is related to the independent attributes. Steps to do Regression in WEKA Step 1: Create data file and open the WEKA as in the same way as we did for Classification. Step 2: Load the regression data file CPU.arff into weka. Click on open file and browse for the file, that shows the following screen Step 3: Run the regression Click on the Classify tab and choose “Linear Regression” from the node under function. This shows the following screen.
  • 9. Click on start that will show output in the classifier output screen which gives a regression equation.
  • 10. Interpretation of the output: From the output you can see that the CPU performance is more dependent on CHMAX and then CACHE High correlation coefficient of 0.912 from output suggests that the dependent variable is strongly associated with the independent variables. We can also determine the new CPU performance by using the regression equation if we have the values of the attributes.