SlideShare una empresa de Scribd logo
1 de 23
Introduction to XLMiner™ DATA Utilities XLMiner and Microsoft Office are registered trademarks of the respective owners.
Brief description of the features of XLMiner: Data Utilities The XLMiner provides the user with a host of Data Utilities at his disposal. They are: 	The different Data Utilities that XLMiner Provides are:- Sample from Worksheet/Database. ,[object Object]
Stratified Sampling.Missing Data handling. Bin Continuous Data. Transform Categorical Data . http://dataminingtools.net
Sample data from Worksheet When huge amounts of data are involved, statisticians prefer taking a sample of the data that represents the entire database. However, such a representative sample is very difficult to obtain.  The entire dataset we want information about is called the population. A sample is a part of population that we actually examine to draw conclusions.  A good sample should be a true representation of data. As far as possible the cases chosen for sample should be like the cases that are not chosen. If the sample design is poor it can produce misleading conclusions. Various methods and techniques are developed to ensure a true sample. XLMiner provides us sampling facilities. http://dataminingtools.net
Sample data from Worksheet In XLMiner, sampling can be done in two ways: Simple Random sampling: 	A random sample of x records is chosen from the data such that every record in that sample has an equal chance of being chosen Stratified Sampling : 	The data is divided into strata of similar items. Then each stratum is sampled using the simple random approach and the results are then combined to give a final sample. http://dataminingtools.net
Sample data from Worksheet- Simple Random Sampling Select the variables to be present in the sample Here “Simple Random sampling is selected We can specify the seed value( value used for random selection) or the wizard will specify it by default. Set the size for the sampled set If selected duplicate copies of records may be used. http://dataminingtools.net
Sample data from Worksheet- Simple Random Sampling output http://dataminingtools.net
Sample data from Worksheet-  Simple Random Sampling output with replacement. Duplicate copies of record exist in the sample. http://dataminingtools.net
Sample data from Worksheet- Stratified Sample( proportionate ) http://dataminingtools.net
Sample data from Worksheet- Stratified Sample( proportionate – output ) As selected by us, the % of records in each stratum in the sample set is same as that in the input set http://dataminingtools.net
Sample data from Worksheet- Stratified Sample(specify number) http://dataminingtools.net
Sample data from Worksheet- Stratified Sample(specify number) All stratums have equal sizes as specified by user (here 10 records each) http://dataminingtools.net
Sample data from Worksheet- Stratified Sample( size of smallest stratum) http://dataminingtools.net
Sample data from Worksheet- Stratified Sample( size of smallest stratum-output) All stratum have size equal to the size of the smallest stratum http://dataminingtools.net
Missing Data Handling This utility allows the user to process the data before any mining method is applied on it. It allows the user to detect the missing values in the data and handle them the way the user wants.   XLMiner� considers a cell to be missing data if it is empty or contains an invalid formula. XLMiner� can be prompted to treat a cell to be missing data  if it contains a certain value specified by the user or handles the data as specified by the user. The user can specify how XLMiner� should correct these missing values. A treatment can be assigned for every variable. The records with missing data can be either deleted fully or the missing values can be replaced.  XLMiner� provides options on how to replace the missing data, e.g. by mean or median or mode or a value specified by the user. The available options depend on the type of variable http://dataminingtools.net
Missing Data Handling http://dataminingtools.net
Missing Data Handling Data Set Select the action to handle the missing data in individual columns and click on “Apply this option to selected variable” http://dataminingtools.net
Missing Data Handling-Output Changed records high-lighted http://dataminingtools.net
Transform Categorical Data Sometimes our data sets may contain variables that take non-numeric values. This makes it difficult to apply standard procedures. Hence XLMiner provides us with a tool which can be used to rename (transform) non-numeric data to numeric data. There are two ways to transform  categorical data: Creating Dummies:  Consider the variable to have 4 distinct values as A,B,C and D. Then 3 new rows, VAL1,VAL2, VAL3 are created with values either 1 or 0 .If row one contains value A the VAL1 will have a value 1,rest have 0.If all have 0,then the row has a value D. Create category scores:  In this if the non-numeric holds 4 distinct values as above, each value( ordered alphabetically) will be numbered from 1 to 4 and a new column is created that contains the value of number the non-numeric variable corresponds to. http://dataminingtools.net
Transform Categorical Data- Dummies Select the variable that contains non-numeric Data and needs to be transformed http://dataminingtools.net
Transform Categorical Data-Category Scores http://dataminingtools.net
Transform Categorical Data-Category Scores(output) http://dataminingtools.net
Thank you For more visit: http://dataminingtools.net http://dataminingtools.net

Más contenido relacionado

La actualidad más candente

SELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODSSELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODS
KAMIL MAJEED
 

La actualidad más candente (18)

Data Processing-Presentation
Data Processing-PresentationData Processing-Presentation
Data Processing-Presentation
 
Dsa unit 1
Dsa unit 1Dsa unit 1
Dsa unit 1
 
Classification
ClassificationClassification
Classification
 
Dma unit 2
Dma unit  2Dma unit  2
Dma unit 2
 
Data Creation and Importing in IBM SPSS
Data Creation and Importing in IBM SPSSData Creation and Importing in IBM SPSS
Data Creation and Importing in IBM SPSS
 
Spss beginners
Spss beginnersSpss beginners
Spss beginners
 
Spss as a research tool
Spss  as a research tool Spss  as a research tool
Spss as a research tool
 
Data processing
Data processingData processing
Data processing
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
What Is the Use of SPSS in Data Analysis
What Is the Use of SPSS in Data AnalysisWhat Is the Use of SPSS in Data Analysis
What Is the Use of SPSS in Data Analysis
 
Data entry in Excel and SPSS
Data entry in Excel and SPSS Data entry in Excel and SPSS
Data entry in Excel and SPSS
 
Data processing & Analysis: SPSS an overview
Data processing & Analysis: SPSS an overviewData processing & Analysis: SPSS an overview
Data processing & Analysis: SPSS an overview
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
Database design
Database designDatabase design
Database design
 
SPSS introduction Presentation
SPSS introduction Presentation SPSS introduction Presentation
SPSS introduction Presentation
 
Ibm spss statistics 19 brief guide
Ibm spss statistics 19 brief guideIbm spss statistics 19 brief guide
Ibm spss statistics 19 brief guide
 
SELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODSSELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODS
 
Spss basics tutorial
Spss basics tutorialSpss basics tutorial
Spss basics tutorial
 

Destacado

Destacado (17)

XL-MINER:Data Exploration
XL-MINER:Data ExplorationXL-MINER:Data Exploration
XL-MINER:Data Exploration
 
XL-Miner: Classification
XL-Miner: ClassificationXL-Miner: Classification
XL-Miner: Classification
 
XL-Miner: Time Series
XL-Miner: Time SeriesXL-Miner: Time Series
XL-Miner: Time Series
 
XL-MINER:Introduction To Xl Miner
XL-MINER:Introduction To Xl MinerXL-MINER:Introduction To Xl Miner
XL-MINER:Introduction To Xl Miner
 
XL MINER: Associations
XL MINER: AssociationsXL MINER: Associations
XL MINER: Associations
 
Areas of machine leanring
Areas of machine leanringAreas of machine leanring
Areas of machine leanring
 
XL-MINER:Prediction
XL-MINER:PredictionXL-MINER:Prediction
XL-MINER:Prediction
 
XL-MINER:Partition
XL-MINER:PartitionXL-MINER:Partition
XL-MINER:Partition
 
Prueba de corridas arriba y abajo de la media
Prueba de corridas arriba y abajo de la mediaPrueba de corridas arriba y abajo de la media
Prueba de corridas arriba y abajo de la media
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 

Similar a XL-MINER:Data Utilities

computer notes - Data Structures - 1
computer notes - Data Structures - 1computer notes - Data Structures - 1
computer notes - Data Structures - 1
ecomputernotes
 

Similar a XL-MINER:Data Utilities (20)

Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginner
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginner
 
PATTERNS08 - Strong Typing and Data Validation in .NET
PATTERNS08 - Strong Typing and Data Validation in .NETPATTERNS08 - Strong Typing and Data Validation in .NET
PATTERNS08 - Strong Typing and Data Validation in .NET
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data Mining
 
data mining
data miningdata mining
data mining
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
3. chapter iii(aggregate data)
3. chapter iii(aggregate data)3. chapter iii(aggregate data)
3. chapter iii(aggregate data)
 
Computer notes - data structures
Computer notes - data structuresComputer notes - data structures
Computer notes - data structures
 
somhelpdoc
somhelpdocsomhelpdoc
somhelpdoc
 
Unit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptxUnit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptx
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Xlminer demo
Xlminer demoXlminer demo
Xlminer demo
 
Splunk 6.2 new features
Splunk 6.2 new featuresSplunk 6.2 new features
Splunk 6.2 new features
 
computer notes - Data Structures - 1
computer notes - Data Structures - 1computer notes - Data Structures - 1
computer notes - Data Structures - 1
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 

Último

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Último (20)

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 

XL-MINER:Data Utilities

  • 1. Introduction to XLMiner™ DATA Utilities XLMiner and Microsoft Office are registered trademarks of the respective owners.
  • 2.
  • 3. Stratified Sampling.Missing Data handling. Bin Continuous Data. Transform Categorical Data . http://dataminingtools.net
  • 4. Sample data from Worksheet When huge amounts of data are involved, statisticians prefer taking a sample of the data that represents the entire database. However, such a representative sample is very difficult to obtain. The entire dataset we want information about is called the population. A sample is a part of population that we actually examine to draw conclusions. A good sample should be a true representation of data. As far as possible the cases chosen for sample should be like the cases that are not chosen. If the sample design is poor it can produce misleading conclusions. Various methods and techniques are developed to ensure a true sample. XLMiner provides us sampling facilities. http://dataminingtools.net
  • 5. Sample data from Worksheet In XLMiner, sampling can be done in two ways: Simple Random sampling: A random sample of x records is chosen from the data such that every record in that sample has an equal chance of being chosen Stratified Sampling : The data is divided into strata of similar items. Then each stratum is sampled using the simple random approach and the results are then combined to give a final sample. http://dataminingtools.net
  • 6. Sample data from Worksheet- Simple Random Sampling Select the variables to be present in the sample Here “Simple Random sampling is selected We can specify the seed value( value used for random selection) or the wizard will specify it by default. Set the size for the sampled set If selected duplicate copies of records may be used. http://dataminingtools.net
  • 7. Sample data from Worksheet- Simple Random Sampling output http://dataminingtools.net
  • 8. Sample data from Worksheet- Simple Random Sampling output with replacement. Duplicate copies of record exist in the sample. http://dataminingtools.net
  • 9. Sample data from Worksheet- Stratified Sample( proportionate ) http://dataminingtools.net
  • 10. Sample data from Worksheet- Stratified Sample( proportionate – output ) As selected by us, the % of records in each stratum in the sample set is same as that in the input set http://dataminingtools.net
  • 11. Sample data from Worksheet- Stratified Sample(specify number) http://dataminingtools.net
  • 12. Sample data from Worksheet- Stratified Sample(specify number) All stratums have equal sizes as specified by user (here 10 records each) http://dataminingtools.net
  • 13. Sample data from Worksheet- Stratified Sample( size of smallest stratum) http://dataminingtools.net
  • 14. Sample data from Worksheet- Stratified Sample( size of smallest stratum-output) All stratum have size equal to the size of the smallest stratum http://dataminingtools.net
  • 15. Missing Data Handling This utility allows the user to process the data before any mining method is applied on it. It allows the user to detect the missing values in the data and handle them the way the user wants.   XLMiner� considers a cell to be missing data if it is empty or contains an invalid formula. XLMiner� can be prompted to treat a cell to be missing data  if it contains a certain value specified by the user or handles the data as specified by the user. The user can specify how XLMiner� should correct these missing values. A treatment can be assigned for every variable. The records with missing data can be either deleted fully or the missing values can be replaced.  XLMiner� provides options on how to replace the missing data, e.g. by mean or median or mode or a value specified by the user. The available options depend on the type of variable http://dataminingtools.net
  • 16. Missing Data Handling http://dataminingtools.net
  • 17. Missing Data Handling Data Set Select the action to handle the missing data in individual columns and click on “Apply this option to selected variable” http://dataminingtools.net
  • 18. Missing Data Handling-Output Changed records high-lighted http://dataminingtools.net
  • 19. Transform Categorical Data Sometimes our data sets may contain variables that take non-numeric values. This makes it difficult to apply standard procedures. Hence XLMiner provides us with a tool which can be used to rename (transform) non-numeric data to numeric data. There are two ways to transform categorical data: Creating Dummies: Consider the variable to have 4 distinct values as A,B,C and D. Then 3 new rows, VAL1,VAL2, VAL3 are created with values either 1 or 0 .If row one contains value A the VAL1 will have a value 1,rest have 0.If all have 0,then the row has a value D. Create category scores: In this if the non-numeric holds 4 distinct values as above, each value( ordered alphabetically) will be numbered from 1 to 4 and a new column is created that contains the value of number the non-numeric variable corresponds to. http://dataminingtools.net
  • 20. Transform Categorical Data- Dummies Select the variable that contains non-numeric Data and needs to be transformed http://dataminingtools.net
  • 21. Transform Categorical Data-Category Scores http://dataminingtools.net
  • 22. Transform Categorical Data-Category Scores(output) http://dataminingtools.net
  • 23. Thank you For more visit: http://dataminingtools.net http://dataminingtools.net
  • 24. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net