4. Sample data from Worksheet When huge amounts of data are involved, statisticians prefer taking a sample of the data that represents the entire database. However, such a representative sample is very difficult to obtain. The entire dataset we want information about is called the population. A sample is a part of population that we actually examine to draw conclusions. A good sample should be a true representation of data. As far as possible the cases chosen for sample should be like the cases that are not chosen. If the sample design is poor it can produce misleading conclusions. Various methods and techniques are developed to ensure a true sample. XLMiner provides us sampling facilities. http://dataminingtools.net
5. Sample data from Worksheet In XLMiner, sampling can be done in two ways: Simple Random sampling: A random sample of x records is chosen from the data such that every record in that sample has an equal chance of being chosen Stratified Sampling : The data is divided into strata of similar items. Then each stratum is sampled using the simple random approach and the results are then combined to give a final sample. http://dataminingtools.net
6. Sample data from Worksheet- Simple Random Sampling Select the variables to be present in the sample Here “Simple Random sampling is selected We can specify the seed value( value used for random selection) or the wizard will specify it by default. Set the size for the sampled set If selected duplicate copies of records may be used. http://dataminingtools.net
7. Sample data from Worksheet- Simple Random Sampling output http://dataminingtools.net
8. Sample data from Worksheet- Simple Random Sampling output with replacement. Duplicate copies of record exist in the sample. http://dataminingtools.net
9. Sample data from Worksheet- Stratified Sample( proportionate ) http://dataminingtools.net
10. Sample data from Worksheet- Stratified Sample( proportionate – output ) As selected by us, the % of records in each stratum in the sample set is same as that in the input set http://dataminingtools.net
11. Sample data from Worksheet- Stratified Sample(specify number) http://dataminingtools.net
12. Sample data from Worksheet- Stratified Sample(specify number) All stratums have equal sizes as specified by user (here 10 records each) http://dataminingtools.net
13. Sample data from Worksheet- Stratified Sample( size of smallest stratum) http://dataminingtools.net
14. Sample data from Worksheet- Stratified Sample( size of smallest stratum-output) All stratum have size equal to the size of the smallest stratum http://dataminingtools.net
15. Missing Data Handling This utility allows the user to process the data before any mining method is applied on it. It allows the user to detect the missing values in the data and handle them the way the user wants. XLMiner� considers a cell to be missing data if it is empty or contains an invalid formula. XLMiner� can be prompted to treat a cell to be missing data if it contains a certain value specified by the user or handles the data as specified by the user. The user can specify how XLMiner� should correct these missing values. A treatment can be assigned for every variable. The records with missing data can be either deleted fully or the missing values can be replaced. XLMiner� provides options on how to replace the missing data, e.g. by mean or median or mode or a value specified by the user. The available options depend on the type of variable http://dataminingtools.net
17. Missing Data Handling Data Set Select the action to handle the missing data in individual columns and click on “Apply this option to selected variable” http://dataminingtools.net
19. Transform Categorical Data Sometimes our data sets may contain variables that take non-numeric values. This makes it difficult to apply standard procedures. Hence XLMiner provides us with a tool which can be used to rename (transform) non-numeric data to numeric data. There are two ways to transform categorical data: Creating Dummies: Consider the variable to have 4 distinct values as A,B,C and D. Then 3 new rows, VAL1,VAL2, VAL3 are created with values either 1 or 0 .If row one contains value A the VAL1 will have a value 1,rest have 0.If all have 0,then the row has a value D. Create category scores: In this if the non-numeric holds 4 distinct values as above, each value( ordered alphabetically) will be numbered from 1 to 4 and a new column is created that contains the value of number the non-numeric variable corresponds to. http://dataminingtools.net
20. Transform Categorical Data- Dummies Select the variable that contains non-numeric Data and needs to be transformed http://dataminingtools.net
23. Thank you For more visit: http://dataminingtools.net http://dataminingtools.net
24. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net