SlideShare una empresa de Scribd logo
1 de 14
Data Processing
What is the need for Data Processing? To get the required information from huge, incomplete, noisy and inconsistent set of data it is necessary to use data processing.
Steps in Data Processing Data Cleaning Data Integration Data Transformation Data reduction Data Summarization
What is Data Cleaning? Data cleaning is a procedure to “clean” the data by filling in missing values, smoothing noisy data, identifying or removing outliers, and resolving inconsistencies
What is Data Integration? Integrating multiple databases, data cubes, or files, this is called data integration.
What is Data Transformation? Data transformation operations, such as normalization and aggregation, are additional data preprocessing procedures that would contribute toward the success of the mining process.
What is Data Reduction? Data reduction obtains a reduced representation of the data set that is much smaller in volume, yet produces the same (or almost the same) analytical results.
What is Data Summarization? It is the processes of representing the collected data in an accurate and compact way without losing any information, it also involves getting a information from collected data. Ex: Display the data as a graph and get the mean, median, mode etc.
How to Clean Data? Handling Missing values Ignore the tuple Fill in the missing value manually Use a global constant to fill in the missing value Use the attribute mean to fill in the missing value Use the attribute mean for all samples belonging to the same class as the given tuple Use the most probable value to fill in the missing value
How to Clean Data? Handle Noisy Data Binning: Binning methods smooth a sorted data value by consulting its “neighborhood”. Regression: Data can be smoothed by fitting the data to a function, such as with regression.  Clustering: Outliers may be detected by clustering, where similar values are organized into groups, or “clusters.”
Data Integration Data Integration combines data from multiple sources into a coherent data store, as in data warehousing. These sources may include multiple databases, data cubes, or flat files. Issues that arises during data integration like Schema integration and object matching Redundancy is another important issue.
Data Transformation Data transformation can be achieved in following ways Smoothing: which works to remove noise from the data Aggregation: where summary or aggregation operations are applied to the data. For example, the daily sales data may be aggregated so as to compute weekly and annuual total scores. Generalization of the data: where low-level or “primitive” (raw) data are replaced by higher-level concepts through the use of concept hierarchies. For example, categorical attributes, like street, can be generalized to higher-level concepts, like city or country. Normalization: where the attribute data are scaled so as to fall within a small specified range, such as −1.0 to 1.0, or 0.0 to 1.0. Attribute construction : this is where new attributes are constructed and added from the given set of attributes to help the mining process.
Data Reduction techniques These are the techniques that can be applied to obtain a reduced representation of the data set that is much smaller in volume, yet closely maintains the integrity of the original data. Data cube aggregation Attribute subset selection Dimensionality reduction Numerosity reduction Discretization and concept hierarchy generation
Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net

Más contenido relacionado

La actualidad más candente

Data preprocessing
Data preprocessingData preprocessing
Data preprocessingkayathri02
 
Data pre processing
Data pre processingData pre processing
Data pre processingpommurajopt
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingksamyMCA
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingSlideshare
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data miningkavitha muneeshwaran
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data MiningDHIVYADEVAKI
 
03 preprocessing
03 preprocessing03 preprocessing
03 preprocessingpurnimatm
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingHarry Potter
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingAmuthamca
 
1.6.data preprocessing
1.6.data preprocessing1.6.data preprocessing
1.6.data preprocessingKrish_ver2
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingHarry Potter
 

La actualidad más candente (18)

Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocess
Data preprocessData preprocess
Data preprocess
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing ng
Data preprocessing   ngData preprocessing   ng
Data preprocessing ng
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
03 preprocessing
03 preprocessing03 preprocessing
03 preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
1.6.data preprocessing
1.6.data preprocessing1.6.data preprocessing
1.6.data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 

Similar a Data MIning: Data processing

Similar a Data MIning: Data processing (20)

Data preprocessing ng
Data preprocessing   ngData preprocessing   ng
Data preprocessing ng
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
1234
12341234
1234
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.ppt
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data1
Data1Data1
Data1
 
Data1
Data1Data1
Data1
 
Preprocess
PreprocessPreprocess
Preprocess
 
Intro to Data warehousing lecture 17
Intro to Data warehousing   lecture 17Intro to Data warehousing   lecture 17
Intro to Data warehousing lecture 17
 
Datapreprocessingppt
DatapreprocessingpptDatapreprocessingppt
Datapreprocessingppt
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Datapreprocess
DatapreprocessDatapreprocess
Datapreprocess
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 

Más de Datamining Tools

Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web miningDatamining Tools
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysisDatamining Tools
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDatamining Tools
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDatamining Tools
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisDatamining Tools
 
Data Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technologyData Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technologyDatamining Tools
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysisDatamining Tools
 
Data mining: Classification and Prediction
Data mining: Classification and PredictionData mining: Classification and Prediction
Data mining: Classification and PredictionDatamining Tools
 
Data Mining: Data mining classification and analysis
Data Mining: Data mining classification and analysisData Mining: Data mining classification and analysis
Data Mining: Data mining classification and analysisDatamining Tools
 
Data Mining: Data mining and key definitions
Data Mining: Data mining and key definitionsData Mining: Data mining and key definitions
Data Mining: Data mining and key definitionsDatamining Tools
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationDatamining Tools
 
Data Mining: Applying data mining
Data Mining: Applying data miningData Mining: Applying data mining
Data Mining: Applying data miningDatamining Tools
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data miningDatamining Tools
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceDatamining Tools
 

Más de Datamining Tools (20)

Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technologyData Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technology
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Data mining: Classification and Prediction
Data mining: Classification and PredictionData mining: Classification and Prediction
Data mining: Classification and Prediction
 
Data Mining: Data mining classification and analysis
Data Mining: Data mining classification and analysisData Mining: Data mining classification and analysis
Data Mining: Data mining classification and analysis
 
Data Mining: Data mining and key definitions
Data Mining: Data mining and key definitionsData Mining: Data mining and key definitions
Data Mining: Data mining and key definitions
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
 
Data Mining: Applying data mining
Data Mining: Applying data miningData Mining: Applying data mining
Data Mining: Applying data mining
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
AI: Learning in AI 2
AI: Learning in AI  2AI: Learning in AI  2
AI: Learning in AI 2
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
 

Último

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Último (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Data MIning: Data processing

  • 2. What is the need for Data Processing? To get the required information from huge, incomplete, noisy and inconsistent set of data it is necessary to use data processing.
  • 3. Steps in Data Processing Data Cleaning Data Integration Data Transformation Data reduction Data Summarization
  • 4. What is Data Cleaning? Data cleaning is a procedure to “clean” the data by filling in missing values, smoothing noisy data, identifying or removing outliers, and resolving inconsistencies
  • 5. What is Data Integration? Integrating multiple databases, data cubes, or files, this is called data integration.
  • 6. What is Data Transformation? Data transformation operations, such as normalization and aggregation, are additional data preprocessing procedures that would contribute toward the success of the mining process.
  • 7. What is Data Reduction? Data reduction obtains a reduced representation of the data set that is much smaller in volume, yet produces the same (or almost the same) analytical results.
  • 8. What is Data Summarization? It is the processes of representing the collected data in an accurate and compact way without losing any information, it also involves getting a information from collected data. Ex: Display the data as a graph and get the mean, median, mode etc.
  • 9. How to Clean Data? Handling Missing values Ignore the tuple Fill in the missing value manually Use a global constant to fill in the missing value Use the attribute mean to fill in the missing value Use the attribute mean for all samples belonging to the same class as the given tuple Use the most probable value to fill in the missing value
  • 10. How to Clean Data? Handle Noisy Data Binning: Binning methods smooth a sorted data value by consulting its “neighborhood”. Regression: Data can be smoothed by fitting the data to a function, such as with regression.  Clustering: Outliers may be detected by clustering, where similar values are organized into groups, or “clusters.”
  • 11. Data Integration Data Integration combines data from multiple sources into a coherent data store, as in data warehousing. These sources may include multiple databases, data cubes, or flat files. Issues that arises during data integration like Schema integration and object matching Redundancy is another important issue.
  • 12. Data Transformation Data transformation can be achieved in following ways Smoothing: which works to remove noise from the data Aggregation: where summary or aggregation operations are applied to the data. For example, the daily sales data may be aggregated so as to compute weekly and annuual total scores. Generalization of the data: where low-level or “primitive” (raw) data are replaced by higher-level concepts through the use of concept hierarchies. For example, categorical attributes, like street, can be generalized to higher-level concepts, like city or country. Normalization: where the attribute data are scaled so as to fall within a small specified range, such as −1.0 to 1.0, or 0.0 to 1.0. Attribute construction : this is where new attributes are constructed and added from the given set of attributes to help the mining process.
  • 13. Data Reduction techniques These are the techniques that can be applied to obtain a reduced representation of the data set that is much smaller in volume, yet closely maintains the integrity of the original data. Data cube aggregation Attribute subset selection Dimensionality reduction Numerosity reduction Discretization and concept hierarchy generation
  • 14. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net