SlideShare una empresa de Scribd logo
1 de 17
Data Mining
Steps and Functionalities
1
Data Mining: A KDD Process
 Data mining: the core of
knowledge discovery
process.
Data Cleaning
Data Integration
Databases
Data
Warehouse
Task-relevant Data
Selection &
Transformation
Data Mining
Pattern Evaluation
2
Steps of a KDD Process
 Data Cleaning
 Handles Noisy, Inconsistent, Incomplete data
 Missing Values
 Noisy data
 Binning, Clustering etc.
 Inconsistencies
 Tools, functional dependencies
3
 Data Integration
 Schema Integration
 Entity Identification problem
 Redundancy
 Correlation Analysis
 Data Selection
 Select Only the task relevant data
Steps of a KDD Process
4
 Data Transformation
 Transform or consolidate data
 Smoothing, Normalization, Feature Construction
 Data Reduction - Compression
 Data Mining
 Intelligent methods are applied to extract patterns
Steps of a KDD Process
5
 Pattern Evaluation
 Interestingness Measures
 Knowledge Presentation
 Visualization
Steps of a KDD Process
6
Data Mining Functionalities
 Descriptive
 Characterize general properties of the data
 Predictive
 Performs inference
 Mining
 Parallel
 Various Granularities
7
Data Mining Functionalities
 Concept/class description
 Association Analysis
 Classification and Prediction
 Cluster Analysis
 Outlier Analysis
 Evolution Analysis
8
Concept/ Class Description
 Data can be associated with Classes /
Concepts
 Computers, Printers
 BigSpenders Vs BudgetSpenders
 Class / Concept Description
 Classes and Concepts can be summarized in
concise and precise terms
 Data Characterization
 Data Discrimination
9
Data Characterization
 Summarization of the general characteristics
 Data collected and aggregated
 OLAP roll up operation
 Attribute Oriented Induction
 Results – Charts, cubes, rules
 Example
 Characteristics of Customers
10
Data Discrimination
 Compare target class and contrasting classes
 Maybe user specified
 Examples:
 Products whose sales increased Vs decreased
 Regular Shoppers Vs Occasional Shoppers
 Output includes Comparative measures
11
Association Analysis
 Discovery of association rules
 Form: X ⇒ Y
 Multi-dimensional
 Age(X, “20…29”) ∧ income(X, “20K…25K”) ⇒
buys(X, “Laptop”)
 Single Dimensional
 buys(X, “Laptop”) ⇒ buys(X, “Software”)
12
Classification and Prediction
 Classification
 Finds models that describe and differentiate
classes or concepts
 Predicts class
 Training data
 Models – rules, decision trees, NN, formulae
 Preceded by relevance analysis (to eliminate
irrelevant attributes)
13
Classification and Prediction
 Prediction
 Derived model is used for prediction
 Data value prediction
 Class label prediction (Classification)
 Trend identification
14
Cluster Analysis
 Unsupervised
 Class labels are missing in the training set
 Maximize Intra-class similarity
 Minimize Inter-class similarity
 Hierarchy of classes
15
Outlier Analysis
 Objects that do not comply with the general
behavior
 Noise Vs Rare events
 Fraud detection
 Statistical tests
 Deviation based methods
16
Evolution Analysis
 Trend detection
 Time series data
 Involves other functionalities
17

Más contenido relacionado

La actualidad más candente

Data mining slides
Data mining slidesData mining slides
Data mining slides
smj
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ankur bhalla
 

La actualidad más candente (20)

Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Data Mining & Applications
Data Mining & ApplicationsData Mining & Applications
Data Mining & Applications
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
Machine Learning and Data Mining
Machine Learning and Data MiningMachine Learning and Data Mining
Machine Learning and Data Mining
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Data analytics
Data analyticsData analytics
Data analytics
 
Web mining (structure mining)
Web mining (structure mining)Web mining (structure mining)
Web mining (structure mining)
 
Data Mining
Data MiningData Mining
Data Mining
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Data preprocessing PPT
Data preprocessing PPTData preprocessing PPT
Data preprocessing PPT
 
Computer Vision image classification
Computer Vision image classificationComputer Vision image classification
Computer Vision image classification
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 

Destacado

How can we train our employees about the basic concepts of Stock Market?
How can we train our employees about the basic concepts of Stock Market? How can we train our employees about the basic concepts of Stock Market?
How can we train our employees about the basic concepts of Stock Market?
Enhance Systems Pvt. Ltd.
 

Destacado (15)

How can we train our employees about the basic concepts of Stock Market?
How can we train our employees about the basic concepts of Stock Market? How can we train our employees about the basic concepts of Stock Market?
How can we train our employees about the basic concepts of Stock Market?
 
CV Rupert Menezes
CV Rupert MenezesCV Rupert Menezes
CV Rupert Menezes
 
Pechacucha
PechacuchaPechacucha
Pechacucha
 
Vaidyanathan VP 05
Vaidyanathan VP 05Vaidyanathan VP 05
Vaidyanathan VP 05
 
Vikalp Sangam (Alternatives Confluence)
Vikalp Sangam (Alternatives Confluence)Vikalp Sangam (Alternatives Confluence)
Vikalp Sangam (Alternatives Confluence)
 
Sport rabbit
Sport rabbitSport rabbit
Sport rabbit
 
Evaluation Activity 3
Evaluation Activity  3Evaluation Activity  3
Evaluation Activity 3
 
Почеци словенске писмености
Почеци словенске писменостиПочеци словенске писмености
Почеци словенске писмености
 
Data journalism e narrazioni civiche. A quali condizioni un giornalismo inve...
Data journalism e narrazioni civiche.  A quali condizioni un giornalismo inve...Data journalism e narrazioni civiche.  A quali condizioni un giornalismo inve...
Data journalism e narrazioni civiche. A quali condizioni un giornalismo inve...
 
Container Inventory Management: Factors influencing Container Interchange
Container Inventory Management: Factors influencing Container InterchangeContainer Inventory Management: Factors influencing Container Interchange
Container Inventory Management: Factors influencing Container Interchange
 
Elena fortun
Elena fortunElena fortun
Elena fortun
 
Big Data - How to Get Started
Big Data - How to Get Started Big Data - How to Get Started
Big Data - How to Get Started
 
Big Data at your Desk with KNIME
Big Data at your Desk with KNIMEBig Data at your Desk with KNIME
Big Data at your Desk with KNIME
 
Ud 7 arte prerrománico
Ud 7  arte prerrománicoUd 7  arte prerrománico
Ud 7 arte prerrománico
 
Food sovereignty: Initiatives and lessons from India
Food sovereignty: Initiatives and lessons from IndiaFood sovereignty: Initiatives and lessons from India
Food sovereignty: Initiatives and lessons from India
 

Similar a 1.2 steps and functionalities

Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
sumit621
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
dataminers.ir
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
Phi Jack
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
bhagathk
 

Similar a 1.2 steps and functionalities (20)

Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Part1
Part1Part1
Part1
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptx
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amato
 
Talk
TalkTalk
Talk
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured dataset
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
Data mining
Data miningData mining
Data mining
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge Discovery
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Data science guide
Data science guideData science guide
Data science guide
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 

Más de Rajendran

Más de Rajendran (20)

Element distinctness lower bounds
Element distinctness lower boundsElement distinctness lower bounds
Element distinctness lower bounds
 
Scheduling with Startup and Holding Costs
Scheduling with Startup and Holding CostsScheduling with Startup and Holding Costs
Scheduling with Startup and Holding Costs
 
Divide and conquer surfing lower bounds
Divide and conquer  surfing lower boundsDivide and conquer  surfing lower bounds
Divide and conquer surfing lower bounds
 
Red black tree
Red black treeRed black tree
Red black tree
 
Hash table
Hash tableHash table
Hash table
 
Medians and order statistics
Medians and order statisticsMedians and order statistics
Medians and order statistics
 
Proof master theorem
Proof master theoremProof master theorem
Proof master theorem
 
Recursion tree method
Recursion tree methodRecursion tree method
Recursion tree method
 
Recurrence theorem
Recurrence theoremRecurrence theorem
Recurrence theorem
 
Master method
Master method Master method
Master method
 
Master method theorem
Master method theoremMaster method theorem
Master method theorem
 
Hash tables
Hash tablesHash tables
Hash tables
 
Lower bound
Lower boundLower bound
Lower bound
 
Master method theorem
Master method theoremMaster method theorem
Master method theorem
 
Greedy algorithms
Greedy algorithmsGreedy algorithms
Greedy algorithms
 
Longest common subsequences in Algorithm Analysis
Longest common subsequences in Algorithm AnalysisLongest common subsequences in Algorithm Analysis
Longest common subsequences in Algorithm Analysis
 
Dynamic programming in Algorithm Analysis
Dynamic programming in Algorithm AnalysisDynamic programming in Algorithm Analysis
Dynamic programming in Algorithm Analysis
 
Average case Analysis of Quicksort
Average case Analysis of QuicksortAverage case Analysis of Quicksort
Average case Analysis of Quicksort
 
Np completeness
Np completenessNp completeness
Np completeness
 
computer languages
computer languagescomputer languages
computer languages
 

Último

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 

1.2 steps and functionalities

  • 1. Data Mining Steps and Functionalities 1
  • 2. Data Mining: A KDD Process  Data mining: the core of knowledge discovery process. Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection & Transformation Data Mining Pattern Evaluation 2
  • 3. Steps of a KDD Process  Data Cleaning  Handles Noisy, Inconsistent, Incomplete data  Missing Values  Noisy data  Binning, Clustering etc.  Inconsistencies  Tools, functional dependencies 3
  • 4.  Data Integration  Schema Integration  Entity Identification problem  Redundancy  Correlation Analysis  Data Selection  Select Only the task relevant data Steps of a KDD Process 4
  • 5.  Data Transformation  Transform or consolidate data  Smoothing, Normalization, Feature Construction  Data Reduction - Compression  Data Mining  Intelligent methods are applied to extract patterns Steps of a KDD Process 5
  • 6.  Pattern Evaluation  Interestingness Measures  Knowledge Presentation  Visualization Steps of a KDD Process 6
  • 7. Data Mining Functionalities  Descriptive  Characterize general properties of the data  Predictive  Performs inference  Mining  Parallel  Various Granularities 7
  • 8. Data Mining Functionalities  Concept/class description  Association Analysis  Classification and Prediction  Cluster Analysis  Outlier Analysis  Evolution Analysis 8
  • 9. Concept/ Class Description  Data can be associated with Classes / Concepts  Computers, Printers  BigSpenders Vs BudgetSpenders  Class / Concept Description  Classes and Concepts can be summarized in concise and precise terms  Data Characterization  Data Discrimination 9
  • 10. Data Characterization  Summarization of the general characteristics  Data collected and aggregated  OLAP roll up operation  Attribute Oriented Induction  Results – Charts, cubes, rules  Example  Characteristics of Customers 10
  • 11. Data Discrimination  Compare target class and contrasting classes  Maybe user specified  Examples:  Products whose sales increased Vs decreased  Regular Shoppers Vs Occasional Shoppers  Output includes Comparative measures 11
  • 12. Association Analysis  Discovery of association rules  Form: X ⇒ Y  Multi-dimensional  Age(X, “20…29”) ∧ income(X, “20K…25K”) ⇒ buys(X, “Laptop”)  Single Dimensional  buys(X, “Laptop”) ⇒ buys(X, “Software”) 12
  • 13. Classification and Prediction  Classification  Finds models that describe and differentiate classes or concepts  Predicts class  Training data  Models – rules, decision trees, NN, formulae  Preceded by relevance analysis (to eliminate irrelevant attributes) 13
  • 14. Classification and Prediction  Prediction  Derived model is used for prediction  Data value prediction  Class label prediction (Classification)  Trend identification 14
  • 15. Cluster Analysis  Unsupervised  Class labels are missing in the training set  Maximize Intra-class similarity  Minimize Inter-class similarity  Hierarchy of classes 15
  • 16. Outlier Analysis  Objects that do not comply with the general behavior  Noise Vs Rare events  Fraud detection  Statistical tests  Deviation based methods 16
  • 17. Evolution Analysis  Trend detection  Time series data  Involves other functionalities 17