SlideShare una empresa de Scribd logo
1 de 23
Introduction to XLMiner™ DATA Utilities XLMiner and Microsoft Office are registered trademarks of the respective owners.
Brief description of the features of XLMiner: Data Utilities The XLMiner provides the user with a host of Data Utilities at his disposal. They are: 	The different Data Utilities that XLMiner Provides are:- Sample from Worksheet/Database. ,[object Object]
Stratified Sampling.Missing Data handling. Bin Continuous Data. Transform Categorical Data . http://dataminingtools.net
Sample data from Worksheet When huge amounts of data are involved, statisticians prefer taking a sample of the data that represents the entire database. However, such a representative sample is very difficult to obtain.  The entire dataset we want information about is called the population. A sample is a part of population that we actually examine to draw conclusions.  A good sample should be a true representation of data. As far as possible the cases chosen for sample should be like the cases that are not chosen. If the sample design is poor it can produce misleading conclusions. Various methods and techniques are developed to ensure a true sample. XLMiner provides us sampling facilities. http://dataminingtools.net
Sample data from Worksheet In XLMiner, sampling can be done in two ways: Simple Random sampling: 	A random sample of x records is chosen from the data such that every record in that sample has an equal chance of being chosen Stratified Sampling : 	The data is divided into strata of similar items. Then each stratum is sampled using the simple random approach and the results are then combined to give a final sample. http://dataminingtools.net
Sample data from Worksheet- Simple Random Sampling Select the variables to be present in the sample Here “Simple Random sampling is selected We can specify the seed value( value used for random selection) or the wizard will specify it by default. Set the size for the sampled set If selected duplicate copies of records may be used. http://dataminingtools.net
Sample data from Worksheet- Simple Random Sampling output http://dataminingtools.net
Sample data from Worksheet-  Simple Random Sampling output with replacement. Duplicate copies of record exist in the sample. http://dataminingtools.net
Sample data from Worksheet- Stratified Sample( proportionate ) http://dataminingtools.net
Sample data from Worksheet- Stratified Sample( proportionate – output ) As selected by us, the % of records in each stratum in the sample set is same as that in the input set http://dataminingtools.net
Sample data from Worksheet- Stratified Sample(specify number) http://dataminingtools.net
Sample data from Worksheet- Stratified Sample(specify number) All stratums have equal sizes as specified by user (here 10 records each) http://dataminingtools.net
Sample data from Worksheet- Stratified Sample( size of smallest stratum) http://dataminingtools.net
Sample data from Worksheet- Stratified Sample( size of smallest stratum-output) All stratum have size equal to the size of the smallest stratum http://dataminingtools.net
Missing Data Handling This utility allows the user to process the data before any mining method is applied on it. It allows the user to detect the missing values in the data and handle them the way the user wants.   XLMiner� considers a cell to be missing data if it is empty or contains an invalid formula. XLMiner� can be prompted to treat a cell to be missing data  if it contains a certain value specified by the user or handles the data as specified by the user. The user can specify how XLMiner� should correct these missing values. A treatment can be assigned for every variable. The records with missing data can be either deleted fully or the missing values can be replaced.  XLMiner� provides options on how to replace the missing data, e.g. by mean or median or mode or a value specified by the user. The available options depend on the type of variable http://dataminingtools.net
Missing Data Handling http://dataminingtools.net
Missing Data Handling Data Set Select the action to handle the missing data in individual columns and click on “Apply this option to selected variable” http://dataminingtools.net
Missing Data Handling-Output Changed records high-lighted http://dataminingtools.net
Transform Categorical Data Sometimes our data sets may contain variables that take non-numeric values. This makes it difficult to apply standard procedures. Hence XLMiner provides us with a tool which can be used to rename (transform) non-numeric data to numeric data. There are two ways to transform  categorical data: Creating Dummies:  Consider the variable to have 4 distinct values as A,B,C and D. Then 3 new rows, VAL1,VAL2, VAL3 are created with values either 1 or 0 .If row one contains value A the VAL1 will have a value 1,rest have 0.If all have 0,then the row has a value D. Create category scores:  In this if the non-numeric holds 4 distinct values as above, each value( ordered alphabetically) will be numbered from 1 to 4 and a new column is created that contains the value of number the non-numeric variable corresponds to. http://dataminingtools.net
Transform Categorical Data- Dummies Select the variable that contains non-numeric Data and needs to be transformed http://dataminingtools.net
Transform Categorical Data-Category Scores http://dataminingtools.net
Transform Categorical Data-Category Scores(output) http://dataminingtools.net
Thank you For more visit: http://dataminingtools.net http://dataminingtools.net

Más contenido relacionado

La actualidad más candente

SELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODSSELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODS
KAMIL MAJEED
 

La actualidad más candente (18)

Data Processing-Presentation
Data Processing-PresentationData Processing-Presentation
Data Processing-Presentation
 
Dsa unit 1
Dsa unit 1Dsa unit 1
Dsa unit 1
 
Classification
ClassificationClassification
Classification
 
Dma unit 2
Dma unit  2Dma unit  2
Dma unit 2
 
Data Creation and Importing in IBM SPSS
Data Creation and Importing in IBM SPSSData Creation and Importing in IBM SPSS
Data Creation and Importing in IBM SPSS
 
Spss beginners
Spss beginnersSpss beginners
Spss beginners
 
Spss as a research tool
Spss  as a research tool Spss  as a research tool
Spss as a research tool
 
Data processing
Data processingData processing
Data processing
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
What Is the Use of SPSS in Data Analysis
What Is the Use of SPSS in Data AnalysisWhat Is the Use of SPSS in Data Analysis
What Is the Use of SPSS in Data Analysis
 
Data entry in Excel and SPSS
Data entry in Excel and SPSS Data entry in Excel and SPSS
Data entry in Excel and SPSS
 
Data processing & Analysis: SPSS an overview
Data processing & Analysis: SPSS an overviewData processing & Analysis: SPSS an overview
Data processing & Analysis: SPSS an overview
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
Database design
Database designDatabase design
Database design
 
SPSS introduction Presentation
SPSS introduction Presentation SPSS introduction Presentation
SPSS introduction Presentation
 
Ibm spss statistics 19 brief guide
Ibm spss statistics 19 brief guideIbm spss statistics 19 brief guide
Ibm spss statistics 19 brief guide
 
SELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODSSELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODS
 
Spss basics tutorial
Spss basics tutorialSpss basics tutorial
Spss basics tutorial
 

Destacado

Destacado (17)

XL-MINER:Data Exploration
XL-MINER:Data ExplorationXL-MINER:Data Exploration
XL-MINER:Data Exploration
 
XL-Miner: Classification
XL-Miner: ClassificationXL-Miner: Classification
XL-Miner: Classification
 
XL-Miner: Time Series
XL-Miner: Time SeriesXL-Miner: Time Series
XL-Miner: Time Series
 
XL-MINER:Introduction To Xl Miner
XL-MINER:Introduction To Xl MinerXL-MINER:Introduction To Xl Miner
XL-MINER:Introduction To Xl Miner
 
XL MINER: Associations
XL MINER: AssociationsXL MINER: Associations
XL MINER: Associations
 
Areas of machine leanring
Areas of machine leanringAreas of machine leanring
Areas of machine leanring
 
XL-MINER:Prediction
XL-MINER:PredictionXL-MINER:Prediction
XL-MINER:Prediction
 
XL-MINER:Partition
XL-MINER:PartitionXL-MINER:Partition
XL-MINER:Partition
 
Prueba de corridas arriba y abajo de la media
Prueba de corridas arriba y abajo de la mediaPrueba de corridas arriba y abajo de la media
Prueba de corridas arriba y abajo de la media
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 

Similar a XL-MINER:Data Utilities

computer notes - Data Structures - 1
computer notes - Data Structures - 1computer notes - Data Structures - 1
computer notes - Data Structures - 1
ecomputernotes
 

Similar a XL-MINER:Data Utilities (20)

Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginner
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginner
 
PATTERNS08 - Strong Typing and Data Validation in .NET
PATTERNS08 - Strong Typing and Data Validation in .NETPATTERNS08 - Strong Typing and Data Validation in .NET
PATTERNS08 - Strong Typing and Data Validation in .NET
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data Mining
 
data mining
data miningdata mining
data mining
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
3. chapter iii(aggregate data)
3. chapter iii(aggregate data)3. chapter iii(aggregate data)
3. chapter iii(aggregate data)
 
Computer notes - data structures
Computer notes - data structuresComputer notes - data structures
Computer notes - data structures
 
somhelpdoc
somhelpdocsomhelpdoc
somhelpdoc
 
Unit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptxUnit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptx
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Xlminer demo
Xlminer demoXlminer demo
Xlminer demo
 
Splunk 6.2 new features
Splunk 6.2 new featuresSplunk 6.2 new features
Splunk 6.2 new features
 
computer notes - Data Structures - 1
computer notes - Data Structures - 1computer notes - Data Structures - 1
computer notes - Data Structures - 1
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

XL-MINER:Data Utilities

  • 1. Introduction to XLMiner™ DATA Utilities XLMiner and Microsoft Office are registered trademarks of the respective owners.
  • 2.
  • 3. Stratified Sampling.Missing Data handling. Bin Continuous Data. Transform Categorical Data . http://dataminingtools.net
  • 4. Sample data from Worksheet When huge amounts of data are involved, statisticians prefer taking a sample of the data that represents the entire database. However, such a representative sample is very difficult to obtain. The entire dataset we want information about is called the population. A sample is a part of population that we actually examine to draw conclusions. A good sample should be a true representation of data. As far as possible the cases chosen for sample should be like the cases that are not chosen. If the sample design is poor it can produce misleading conclusions. Various methods and techniques are developed to ensure a true sample. XLMiner provides us sampling facilities. http://dataminingtools.net
  • 5. Sample data from Worksheet In XLMiner, sampling can be done in two ways: Simple Random sampling: A random sample of x records is chosen from the data such that every record in that sample has an equal chance of being chosen Stratified Sampling : The data is divided into strata of similar items. Then each stratum is sampled using the simple random approach and the results are then combined to give a final sample. http://dataminingtools.net
  • 6. Sample data from Worksheet- Simple Random Sampling Select the variables to be present in the sample Here “Simple Random sampling is selected We can specify the seed value( value used for random selection) or the wizard will specify it by default. Set the size for the sampled set If selected duplicate copies of records may be used. http://dataminingtools.net
  • 7. Sample data from Worksheet- Simple Random Sampling output http://dataminingtools.net
  • 8. Sample data from Worksheet- Simple Random Sampling output with replacement. Duplicate copies of record exist in the sample. http://dataminingtools.net
  • 9. Sample data from Worksheet- Stratified Sample( proportionate ) http://dataminingtools.net
  • 10. Sample data from Worksheet- Stratified Sample( proportionate – output ) As selected by us, the % of records in each stratum in the sample set is same as that in the input set http://dataminingtools.net
  • 11. Sample data from Worksheet- Stratified Sample(specify number) http://dataminingtools.net
  • 12. Sample data from Worksheet- Stratified Sample(specify number) All stratums have equal sizes as specified by user (here 10 records each) http://dataminingtools.net
  • 13. Sample data from Worksheet- Stratified Sample( size of smallest stratum) http://dataminingtools.net
  • 14. Sample data from Worksheet- Stratified Sample( size of smallest stratum-output) All stratum have size equal to the size of the smallest stratum http://dataminingtools.net
  • 15. Missing Data Handling This utility allows the user to process the data before any mining method is applied on it. It allows the user to detect the missing values in the data and handle them the way the user wants.   XLMiner� considers a cell to be missing data if it is empty or contains an invalid formula. XLMiner� can be prompted to treat a cell to be missing data  if it contains a certain value specified by the user or handles the data as specified by the user. The user can specify how XLMiner� should correct these missing values. A treatment can be assigned for every variable. The records with missing data can be either deleted fully or the missing values can be replaced.  XLMiner� provides options on how to replace the missing data, e.g. by mean or median or mode or a value specified by the user. The available options depend on the type of variable http://dataminingtools.net
  • 16. Missing Data Handling http://dataminingtools.net
  • 17. Missing Data Handling Data Set Select the action to handle the missing data in individual columns and click on “Apply this option to selected variable” http://dataminingtools.net
  • 18. Missing Data Handling-Output Changed records high-lighted http://dataminingtools.net
  • 19. Transform Categorical Data Sometimes our data sets may contain variables that take non-numeric values. This makes it difficult to apply standard procedures. Hence XLMiner provides us with a tool which can be used to rename (transform) non-numeric data to numeric data. There are two ways to transform categorical data: Creating Dummies: Consider the variable to have 4 distinct values as A,B,C and D. Then 3 new rows, VAL1,VAL2, VAL3 are created with values either 1 or 0 .If row one contains value A the VAL1 will have a value 1,rest have 0.If all have 0,then the row has a value D. Create category scores: In this if the non-numeric holds 4 distinct values as above, each value( ordered alphabetically) will be numbered from 1 to 4 and a new column is created that contains the value of number the non-numeric variable corresponds to. http://dataminingtools.net
  • 20. Transform Categorical Data- Dummies Select the variable that contains non-numeric Data and needs to be transformed http://dataminingtools.net
  • 21. Transform Categorical Data-Category Scores http://dataminingtools.net
  • 22. Transform Categorical Data-Category Scores(output) http://dataminingtools.net
  • 23. Thank you For more visit: http://dataminingtools.net http://dataminingtools.net
  • 24. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net