Data Reduction Stratergies

•Descargar como PPTX, PDF•

0 recomendaciones•612 vistas

AnjaliSoorej

Application of Data Reduction Stratergies

Datos y análisis

DATA REDUCTION STRATEGIES
DATA CUBE AGGREGATION
ATTRIBUTE SUBSET SELECTION

Why data reduction?
 Huge amount of data is being created day by day.
 Development of big data platform.
 Poor performance of old algorithms.
 Most of the data mining algorithms are column wise implemented.
 Pushed for data reduction procedures.

What is data reduction?
Data reduction is a process that reduced the volume of
original data and represents it in a much smaller volume.
 It maintains the integrity of the data while reducing.
 The time required for data reduction should not overshadow the the time
saved by data mining on the reduced data set.
 Data reduction does not affect the result obtained from data mining.
 Data reduction increases the efficiency of data mining.

Data reduction strategies
1. Data cube aggregation
2. Attribute subset selection
3. Dimensionality reduction
4. Numerosity reduction
5. Discretization and concept hierarchy generation

Data Cube Aggregation
This technique is used to aggregate
(combine) data in a simpler form. So we can
summarize the data in such a way that the data is
used as result

Data Cube Aggregation
The data is given of states and their profit earned in
dollars for selling laptops in each country in
different tables by each state .

States Gross Profit($)
Arizona 500
Texas 320
Illanoid 430
States Gross Profit($)
Kerala 245
Tamil Nadu 380
Goa 950
States Gross Profit($)
Alberta 420
Manitoba 200
Ontario 300
Country Gross Profit($)
USA 1250
India 1575
Canada 920
Country
USA
Country
Canada
Country
India

Attribute Subset Selection
From a large number of attributes a minimal
attribute set is being reduced by eliminating
the irrelevant attributes that may not much
affect the data. Mining of reduced data
makes it easier to understand.

Methods of Attribute Subset Selection are:
1. Stepwise Forward Selection- It starts with an empty set and add the
relevant attributes ignoring the rest.
2. Step-wise backward elimination –It starts with full set and removes
the irrelevant attributes keeping the rest.
3. Combining forward selection and backward elimination-select the
best and removes the worst
4. Decision-tree induction-It is a flowchart like structure to choose best
attribute to partition data.

Example
A data set is given from which we need to segregate the
number of male, female and transgender individuals who are
eligible for voting.
Initial Attribute Set={ Name, Age, Gender, Address, Phone}

Forward Selection
 Initial attribute set={ Name, Age, Gender, Address, Phone}
 Initial Reduced Set =>{ }
 =>{Age}
 =>{Age, Gender}
 Reduced attribute set =>{Age ,Gender}

Backward Elimination
 Initial Attribute Set=> { Name, Age, Gender, Address, Phone }
 Initial Reduced Set=> { Name, Age, Gender, Address, Phone }
 => { Age, Gender, Address, Phone }
 => { Age, Gender, Phone }
 => { Age, Gender }
 Reduced Attribute Set=> { Age, Gender }

Decision Tree Induction
Initial attribute={Name,Age,Gender,Address,Phone}
Age
Not a
voter
Gender
Male Female T.Gender
>=18
<18
Reduced attribute set={Age ,Gender}

Thank You
Ananthakrishnan P.G.
Anjali Soorej
Ann Mary Sajan

Más contenido relacionado

La actualidad más candente

Clustering in data Mining (Data Mining)Mustafa Sherazi

Chapter 4 ClassificationKhalid Elshafie

3.3 hierarchical methodsKrish_ver2

01 Data Mining: Concepts and Techniques, 2nd ed.Institute of Technology Telkom

2. visualization in data miningAzad public school

Web content miningDaminda Herath

Data Mining: Mining ,associations, and correlationsDatamining Tools

Data preprocessing in Data MiningDHIVYADEVAKI

Classification of dataDr. C.V. Suresh Babu

Data Mining: clustering and analysisDataminingTools Inc

Data preprocessing using Machine Learning Gopal Sakarkar

introduction to data mining tutorial Salah Amean

Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsSalah Amean

Data mining tasksKhwaja Aamer

Ensemble Method (Bagging Boosting)Abdullah al Mamun

Clustering in Data MiningArchana Swaminathan

Data preprocessingJason Rodrigues

DATA WRANGLING presentation.pptxAbdullahAbbasi55

ClassificationCloudxLab

CLUSTERING IN DATA MINING.pdfSowmyaJyothi3

La actualidad más candente (20)

Clustering in data Mining (Data Mining)

Chapter 4 Classification

3.3 hierarchical methods

01 Data Mining: Concepts and Techniques, 2nd ed.

2. visualization in data mining

Web content mining

Data Mining: Mining ,associations, and correlations

Data preprocessing in Data Mining

Classification of data

Data Mining: clustering and analysis

Data preprocessing using Machine Learning

introduction to data mining tutorial

Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts

Data mining tasks

Ensemble Method (Bagging Boosting)

Clustering in Data Mining

Data preprocessing

DATA WRANGLING presentation.pptx

Classification

CLUSTERING IN DATA MINING.pdf

Similar a Data Reduction Stratergies

Intro to Data warehousing lecture 17AnwarrChaudary

Data reductionkalavathisugan

KNOLX_Data_preprocessingKnoldus Inc.

Clutter Reduction in Multi-Dimensional Visualization by Using Dimension Reduc...International Journal of Science and Research (IJSR)

Pre-Processing and Data PreparationUmair Shafique

UNIT 2: Part 2: Data Warehousing and Data MiningNandakumar P

Introduction to Datamining Concept and TechniquesSơn Còm Nhom

ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...csandit

Data preprocessing for efficient external sortingeSAT Journals

External data preprocessing for efficient sortingeSAT Publishing House

DataPreprocessing.pptxDr-Dipali Meher

Data .pptxssuserbda195

ML-ChapterTwo-Data Preprocessing.pptbelay41

Data preprocessingdineshbabuspr

Data Preprocessingdineshbabuspr

Data preprocessingdineshbabuspr

AssignmentdataminingChandrika Sweety

Data preprocessingdineshbabuspr

New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...ijaia

Data Mining Module 2 Business Analytics.Jayanti Pande

Similar a Data Reduction Stratergies (20)

Intro to Data warehousing lecture 17

Data reduction

KNOLX_Data_preprocessing

Clutter Reduction in Multi-Dimensional Visualization by Using Dimension Reduc...

Pre-Processing and Data Preparation

UNIT 2: Part 2: Data Warehousing and Data Mining

Introduction to Datamining Concept and Techniques

ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...

Data preprocessing for efficient external sorting

External data preprocessing for efficient sorting

DataPreprocessing.pptx

Data .pptx

ML-ChapterTwo-Data Preprocessing.ppt

Data preprocessing

Data Preprocessing

Data preprocessing

Assignmentdatamining

Data preprocessing

New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...

Data Mining Module 2 Business Analytics.

Último

GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Universitat Politècnica de Catalunya

Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss

LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter

How we prevented account sharing with MFAAndrei Kaleshka

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research

Real-Time AI Streaming - AI Max PrincetonTimothy Spann

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics

办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss

20240419 - Measurecamp Amsterdam - SAM.pdfHuman37

DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett

April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann

modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort

Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss

Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2

Semantic Shed - Squashing and Squeezing.pptxMike Bennett

Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics

Data Reduction Stratergies

1. DATA REDUCTION STRATEGIES DATA CUBE AGGREGATION ATTRIBUTE SUBSET SELECTION

2. Why data reduction?  Huge amount of data is being created day by day.  Development of big data platform.  Poor performance of old algorithms.  Most of the data mining algorithms are column wise implemented.  Pushed for data reduction procedures.

3. What is data reduction? Data reduction is a process that reduced the volume of original data and represents it in a much smaller volume.  It maintains the integrity of the data while reducing.  The time required for data reduction should not overshadow the the time saved by data mining on the reduced data set.  Data reduction does not affect the result obtained from data mining.  Data reduction increases the efficiency of data mining.

4. Data reduction strategies 1. Data cube aggregation 2. Attribute subset selection 3. Dimensionality reduction 4. Numerosity reduction 5. Discretization and concept hierarchy generation

5. Data Cube Aggregation This technique is used to aggregate (combine) data in a simpler form. So we can summarize the data in such a way that the data is used as result

6. Data Cube Aggregation The data is given of states and their profit earned in dollars for selling laptops in each country in different tables by each state .

7. States Gross Profit($) Arizona 500 Texas 320 Illanoid 430 States Gross Profit($) Kerala 245 Tamil Nadu 380 Goa 950 States Gross Profit($) Alberta 420 Manitoba 200 Ontario 300 Country Gross Profit($) USA 1250 India 1575 Canada 920 Country USA Country Canada Country India

8. Attribute Subset Selection From a large number of attributes a minimal attribute set is being reduced by eliminating the irrelevant attributes that may not much affect the data. Mining of reduced data makes it easier to understand.

9. Methods of Attribute Subset Selection are: 1. Stepwise Forward Selection- It starts with an empty set and add the relevant attributes ignoring the rest. 2. Step-wise backward elimination –It starts with full set and removes the irrelevant attributes keeping the rest. 3. Combining forward selection and backward elimination-select the best and removes the worst 4. Decision-tree induction-It is a flowchart like structure to choose best attribute to partition data.

10. Example A data set is given from which we need to segregate the number of male, female and transgender individuals who are eligible for voting. Initial Attribute Set={ Name, Age, Gender, Address, Phone}

11. Forward Selection  Initial attribute set={ Name, Age, Gender, Address, Phone}  Initial Reduced Set =>{ }  =>{Age}  =>{Age, Gender}  Reduced attribute set =>{Age ,Gender}

12. Backward Elimination  Initial Attribute Set=> { Name, Age, Gender, Address, Phone }  Initial Reduced Set=> { Name, Age, Gender, Address, Phone }  => { Age, Gender, Address, Phone }  => { Age, Gender, Phone }  => { Age, Gender }  Reduced Attribute Set=> { Age, Gender }

13. Decision Tree Induction Initial attribute={Name,Age,Gender,Address,Phone} Age Not a voter Gender Male Female T.Gender >=18 <18 Reduced attribute set={Age ,Gender}

14. Thank You Ananthakrishnan P.G. Anjali Soorej Ann Mary Sajan

Data Reduction Stratergies

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Data Reduction Stratergies

Similar a Data Reduction Stratergies (20)

Último

Último (20)

Data Reduction Stratergies