SlideShare una empresa de Scribd logo
1 de 23
By
R. Deepa (IT),
Batch: 2016-2019
Department of (CS&IT),
Nadar Saraswathi College of Arts Science, Theni.
Concepts and
Techniques
CHAPTER : 1
INTRODUCTION
 Evolution of Data Mining technology
 What is Data Mining?
 Data Mining Tasks
 Data Mining : On what kind of data?
 Are all the patterns interesting?
 Data Mining : A KDD Process
 Classification of Data Mining systems
 Major Issues in Data mining
 Summary
EVOLUTION OF DATA MINING TECHNOLOGY
Evolution Step Business Question Enabling
Technologies
Data Collection
(1960s)
“What was my total revenue
in the last five years? ”
Computers,
tapes,disks
Data Access
(1980s)
“What were unit sales in New
England last March? “
Relational Databases
(RDBMS),
Structured Query
Language(SQL),
ODBC
Data Warehousing &
Decision Support
(1990s)
“What were unit sales in New
England last March? Drill
down to Boston.”
On-line analytic
processing(OLAP),
Multidimensional
Database, Data
Warehouses
Data Mining
(Emerging Today)
“What ’s likely to happen to
Boston unit sales next month?
Why? “
Advanced algorithms,
Multiprocessor
Computers,
Massive Databases
R.Deepa ITData Mining: Concepts and techniques
WHAT IS DATA MINING ?
Data Mining:
Data Mining refers to extracting or “Mining” knowledge
from large amounts of data.
• Extraction of interesting (non-trivial, implicit, previously
unknown, and potentially useful) information or patterns from data
in large databases.
Alternatives Names:
Knowledge Discovery from Data(KDD),
Knowledge extraction,
Data / pattern analysis,
Data archeology,
Data dredging,
Information harvesting,
Business intelligence, etc..,
.
R.Deepa ITData Mining: Concepts and techniques
DECISIONS IN DATA MINING :
Databases to be mined
Relational, transactional, object-oriented, object-
relational, spatial, time-series, text, legacy, multi-media,
heterogeneous, WWW, etc.
Knowledge to be mined
Association, classification, clustering , etc.
Techniques utilized
Database-oriented, Data warehouse(OLAP), Machine
learning, Statistics, Visualization, Neural Networks, etc.
Applications adapted
Retail, Telecommunication, Banking, Fraud analysis, DNA
mining, Stock market analysis, Web mining, Weblog analysis, etc.
R.Deepa ITData Mining: Concepts and techniques
DATA MINING TASKS
 Prediction Tasks:
Use some variables to predict unknown or future
values of other variables.
 Description Tasks:
Find human-interpretable patterns that describe
the data.
Common Data Mining Tasks:
• Classification(predictive)
• Clustering(descriptive)
• Association Rule Discovery(descriptive)
• Sequential Pattern Discovery(descriptive)
• Regression (predictive)
• Deviation Detection(predictive)
R.Deepa ITData Mining: Concepts and techniques
DATA MINING - A KDD PROCESS
++
DATABASES
DATA
MINING
DATA
WAREHOUSE
PATTERNS
FLAT FILES
Cleaning and
Integration
Selection and
Transformation
R.Deepa IT
7 STEPS IN KDD PROCESS
1.Data Cleaning:
to remove noise and inconsistent data
2.Data Integration:
where multiple data sources may be combined
3.Data Selection:
where data relevant to the analysis task are retrieved from the database
4.Data Transformation:
where data are transformed and consolidated into forms appropriate for
mining performing summary or aggregation operations.
5.Data Mining:
an essential process where intelligent methods are applied to extract
data patterns
6.Pattern Evaluation:
to identify the truly interesting patterns representing knowledge based
on interestingness measures.
6.Knowledge Presentation:
where visualization and knowledge representation techniques are used to
present mined knowledge to users .
R.Deepa ITData Mining: Concepts and techniques
ARCHITECTURE OF A TYPICAL DATA
MINING SYSTEM
Graphical user interface
Pattern evaluation
Data mining engine
Database or Data warehouse
Server
Data
Warehouse
Knowledge-Base
DataBases
Data Cleaning & Data Integration Filtering
R.Deepa IT
DATA MINING - ON WHAT KIND OF DATA?
1.Relational Databases
2.Data Warehouses
3.Transactional Databases
4.Advanced Data and Information System
• Object-Oriented and Object-Relational Databases
• Spatial and Spatiotemporal Databases
• Heterogeneous and Legacy Databases
• Text Databases
• Multimedia Databases
• Data Streams
• WWW
R.Deepa ITData Mining: Concepts and techniques
DATA MINING FUNCTIONALITIES
 Concept /Class Description: Characterization and
Discrimination
Generalize, summarize and contrast data
characteristics, eg., dry vs. wet regions .
 Data Characterization is a summarization of the
general characteristics or features in a target class of
data.
 Data Discrimination is a comparison of the general
features of target class data objects with the general
features of objects from one or a set of contrasting
classes.
R.Deepa ITData Mining: Concepts and techniques
 Association (Correlations and Causality)
* Multi- dimensional vs. single dimensional
association
* age( X, ”20….29”)^income (X,”20…..29K”) 
buys( X, “PC”)[Support=2%, Confidence= 60%]
* contains( T, “Computer”)contains (
X,”Software”) [1% and 75%]
R.Deepa ITData Mining: Concepts and techniques
 Classification and Prediction
* Finding models (function) that describe and
distinguish classes or concepts for future prediction
* Eg., classify countries based on climate, or
classify cars based on gas mileage.
Presentation: Decision tree, Classification rule, Neural network
Prediction: predict some unknown or missing numerical values
R.Deepa ITData Mining: Concepts and techniques
 Cluster Analysis
Class label is unknown: Group data to form new
classes, eg., Cluster houses to find distribution patterns
Clustering based on the principle: Maximizing
the intra-class similarity and minimizing the inter-class similarity.
 Outlier Analysis
Outlier: A data object that does not comply with the
general behavior of the data
It can be considered as noise or exception but
is quite useful in fraud detection, rare events analysis.
Trend and Evolution Analysis
Trend and Deviation: regression analysis
Sequential pattern mining, periodicity analysis
Similarity based Analysis
Other pattern directed or statistical
analyses
R.Deepa IT
ARE ALL THE “DISCOVERED”
PATTERNS INTERESTING?
 A data mining system/query may generate thousands of
patterns, not all of them are interesting.
 Suggested approach: Human-oriented, query-based,
focused mining
 Interestingness measures : A pattern is interesting if it is
easily understood by humans, valid on new or test data with
some degree of certainty, potentially useful, novel, or validates
some hypothesis that a user seeks to confirm
 Objective vs. subjective interestingness measures:
Objective: based on statistics and structures of patterns,
eg., support, confidence, etc.
Subjective: based on user’s belief in the data, eg.,
unexpectedness, novelty, actionability, etc.
.
R.Deepa IT
CAN WE FIND AND ONLY INTERESTING
PATTERNS?
 Find all the interesting patterns: Completeness
 Can a data mining system find all the interesting patterns?
 Association vs. Classification vs. Clustering
 Search for only interesting patterns : Optimization
 Can a data mining system find only the interesting patterns?
 Approaches
 First general all the patterns and then filter out the
uninteresting ones.
 Generate only the interesting patterns-mining query
optimization
R.Deepa ITData Mining: Concepts and techniques
DATA MINING: CONFLUENCE OF
MULTIPLE DISCIPLINES
Statistics
Visualization
Other
Disciplines
Data MiningMachine
Learning
Information
Science
Database
Technology
R.Deepa ITData Mining: Concepts and techniques
DATA MINING : CLASSIFICATION
SCHEMES
 General Functionality
 Descriptive data mining
 Predictive data mining
 Different views ,different classifications
 Kinds of databases to be mined
 Kinds of knowledge to be discovered
 Kinds of techniques utilized
 Kinds of applications adapted
R.Deepa IT
Data Mining: Concepts and techniques
MAJOR ISSUES IN DATA MINING
 Mining methodology and user interaction
 Mining different kinds of knowledge in databases
 Interactive mining of Knowledge at multiple levels of
abstraction
 Incorporation of background knowledge
 Data mining query languages and ad-hoc data mining
 Expression and visualization of data mining results
 Handling noise and incomplete data
 Pattern evaluation: the interestingness problem
 Performance and scalability
 Efficiency and scalability of data mining algorithms
 Parallel, distributed and incremental mining methods
R.Deepa ITData Mining: Concepts and techniques
 Issues relating to the diversity of data types
 Handling relational and complex types of data
 Mining information from heterogeneous databases and
global information systems(WWW)
 Issues related to applications and social impacts
 Application of discovered knowledge
• Domain-specific data mining tools
• Intelligent query answering
• Process control and decision making
 Integration of the discovered knowledge with existing
knowledge: A knowledge fusion problem
 Protection of data security, integrity, and privacy.
R.Deepa ITData Mining: Concepts and techniques
SUMMARY
 Data Mining: discovering interesting patterns from large
amounts of data
 A natural evolution of databases technology, in great demand
, with wide applications
 A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
 Mining can be performed in a variety of information
repositories
 Data mining functionalities:
Characterization, Discrimination, Association,
Classification, Clustering, Outlier and Trend analysis, etc.
 Classification of Data mining systems
 Major issues in Data Mining. R.Deepa IT
THANK YOU!
R.Deepa IT

Más contenido relacionado

La actualidad más candente

Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 
data mining
data miningdata mining
data mininguoitc
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?Bernard Marr
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)Shweta Ghate
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an IntroductionAli Abbasi
 
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre..."An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...Edge AI and Vision Alliance
 
Data preparation
Data preparationData preparation
Data preparationTony Nguyen
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 

La actualidad más candente (20)

Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
data mining
data miningdata mining
data mining
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
data mining
data miningdata mining
data mining
 
Introduction
IntroductionIntroduction
Introduction
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre..."An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
 
Data preparation
Data preparationData preparation
Data preparation
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Text mining
Text miningText mining
Text mining
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 

Similar a Data Mining : Concepts and Techniques

Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptPadmajaLaksh
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dmsumit621
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 abhagathk
 
Data Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesData Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesasnaparveen414
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slidestafosepsdfasg
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.pptadmsoyadm4
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Miningdataminers.ir
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1DanWooster1
 

Similar a Data Mining : Concepts and Techniques (20)

Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
Data Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesData Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notes
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Cs501 dm intro
Cs501 dm introCs501 dm intro
Cs501 dm intro
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slides
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
dwdm unit 1.ppt
dwdm unit 1.pptdwdm unit 1.ppt
dwdm unit 1.ppt
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Data mining
Data miningData mining
Data mining
 

Último

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 

Último (20)

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 

Data Mining : Concepts and Techniques

  • 1. By R. Deepa (IT), Batch: 2016-2019 Department of (CS&IT), Nadar Saraswathi College of Arts Science, Theni. Concepts and Techniques
  • 2. CHAPTER : 1 INTRODUCTION  Evolution of Data Mining technology  What is Data Mining?  Data Mining Tasks  Data Mining : On what kind of data?  Are all the patterns interesting?  Data Mining : A KDD Process  Classification of Data Mining systems  Major Issues in Data mining  Summary
  • 3. EVOLUTION OF DATA MINING TECHNOLOGY Evolution Step Business Question Enabling Technologies Data Collection (1960s) “What was my total revenue in the last five years? ” Computers, tapes,disks Data Access (1980s) “What were unit sales in New England last March? “ Relational Databases (RDBMS), Structured Query Language(SQL), ODBC Data Warehousing & Decision Support (1990s) “What were unit sales in New England last March? Drill down to Boston.” On-line analytic processing(OLAP), Multidimensional Database, Data Warehouses Data Mining (Emerging Today) “What ’s likely to happen to Boston unit sales next month? Why? “ Advanced algorithms, Multiprocessor Computers, Massive Databases R.Deepa ITData Mining: Concepts and techniques
  • 4. WHAT IS DATA MINING ? Data Mining: Data Mining refers to extracting or “Mining” knowledge from large amounts of data. • Extraction of interesting (non-trivial, implicit, previously unknown, and potentially useful) information or patterns from data in large databases. Alternatives Names: Knowledge Discovery from Data(KDD), Knowledge extraction, Data / pattern analysis, Data archeology, Data dredging, Information harvesting, Business intelligence, etc.., . R.Deepa ITData Mining: Concepts and techniques
  • 5. DECISIONS IN DATA MINING : Databases to be mined Relational, transactional, object-oriented, object- relational, spatial, time-series, text, legacy, multi-media, heterogeneous, WWW, etc. Knowledge to be mined Association, classification, clustering , etc. Techniques utilized Database-oriented, Data warehouse(OLAP), Machine learning, Statistics, Visualization, Neural Networks, etc. Applications adapted Retail, Telecommunication, Banking, Fraud analysis, DNA mining, Stock market analysis, Web mining, Weblog analysis, etc. R.Deepa ITData Mining: Concepts and techniques
  • 6. DATA MINING TASKS  Prediction Tasks: Use some variables to predict unknown or future values of other variables.  Description Tasks: Find human-interpretable patterns that describe the data. Common Data Mining Tasks: • Classification(predictive) • Clustering(descriptive) • Association Rule Discovery(descriptive) • Sequential Pattern Discovery(descriptive) • Regression (predictive) • Deviation Detection(predictive) R.Deepa ITData Mining: Concepts and techniques
  • 7. DATA MINING - A KDD PROCESS ++ DATABASES DATA MINING DATA WAREHOUSE PATTERNS FLAT FILES Cleaning and Integration Selection and Transformation R.Deepa IT
  • 8. 7 STEPS IN KDD PROCESS 1.Data Cleaning: to remove noise and inconsistent data 2.Data Integration: where multiple data sources may be combined 3.Data Selection: where data relevant to the analysis task are retrieved from the database 4.Data Transformation: where data are transformed and consolidated into forms appropriate for mining performing summary or aggregation operations. 5.Data Mining: an essential process where intelligent methods are applied to extract data patterns 6.Pattern Evaluation: to identify the truly interesting patterns representing knowledge based on interestingness measures. 6.Knowledge Presentation: where visualization and knowledge representation techniques are used to present mined knowledge to users . R.Deepa ITData Mining: Concepts and techniques
  • 9. ARCHITECTURE OF A TYPICAL DATA MINING SYSTEM Graphical user interface Pattern evaluation Data mining engine Database or Data warehouse Server Data Warehouse Knowledge-Base DataBases Data Cleaning & Data Integration Filtering R.Deepa IT
  • 10. DATA MINING - ON WHAT KIND OF DATA? 1.Relational Databases 2.Data Warehouses 3.Transactional Databases 4.Advanced Data and Information System • Object-Oriented and Object-Relational Databases • Spatial and Spatiotemporal Databases • Heterogeneous and Legacy Databases • Text Databases • Multimedia Databases • Data Streams • WWW R.Deepa ITData Mining: Concepts and techniques
  • 11. DATA MINING FUNCTIONALITIES  Concept /Class Description: Characterization and Discrimination Generalize, summarize and contrast data characteristics, eg., dry vs. wet regions .  Data Characterization is a summarization of the general characteristics or features in a target class of data.  Data Discrimination is a comparison of the general features of target class data objects with the general features of objects from one or a set of contrasting classes. R.Deepa ITData Mining: Concepts and techniques
  • 12.  Association (Correlations and Causality) * Multi- dimensional vs. single dimensional association * age( X, ”20….29”)^income (X,”20…..29K”)  buys( X, “PC”)[Support=2%, Confidence= 60%] * contains( T, “Computer”)contains ( X,”Software”) [1% and 75%] R.Deepa ITData Mining: Concepts and techniques
  • 13.  Classification and Prediction * Finding models (function) that describe and distinguish classes or concepts for future prediction * Eg., classify countries based on climate, or classify cars based on gas mileage. Presentation: Decision tree, Classification rule, Neural network Prediction: predict some unknown or missing numerical values R.Deepa ITData Mining: Concepts and techniques
  • 14.  Cluster Analysis Class label is unknown: Group data to form new classes, eg., Cluster houses to find distribution patterns Clustering based on the principle: Maximizing the intra-class similarity and minimizing the inter-class similarity.
  • 15.  Outlier Analysis Outlier: A data object that does not comply with the general behavior of the data It can be considered as noise or exception but is quite useful in fraud detection, rare events analysis. Trend and Evolution Analysis Trend and Deviation: regression analysis Sequential pattern mining, periodicity analysis Similarity based Analysis Other pattern directed or statistical analyses R.Deepa IT
  • 16. ARE ALL THE “DISCOVERED” PATTERNS INTERESTING?  A data mining system/query may generate thousands of patterns, not all of them are interesting.  Suggested approach: Human-oriented, query-based, focused mining  Interestingness measures : A pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm  Objective vs. subjective interestingness measures: Objective: based on statistics and structures of patterns, eg., support, confidence, etc. Subjective: based on user’s belief in the data, eg., unexpectedness, novelty, actionability, etc. . R.Deepa IT
  • 17. CAN WE FIND AND ONLY INTERESTING PATTERNS?  Find all the interesting patterns: Completeness  Can a data mining system find all the interesting patterns?  Association vs. Classification vs. Clustering  Search for only interesting patterns : Optimization  Can a data mining system find only the interesting patterns?  Approaches  First general all the patterns and then filter out the uninteresting ones.  Generate only the interesting patterns-mining query optimization R.Deepa ITData Mining: Concepts and techniques
  • 18. DATA MINING: CONFLUENCE OF MULTIPLE DISCIPLINES Statistics Visualization Other Disciplines Data MiningMachine Learning Information Science Database Technology R.Deepa ITData Mining: Concepts and techniques
  • 19. DATA MINING : CLASSIFICATION SCHEMES  General Functionality  Descriptive data mining  Predictive data mining  Different views ,different classifications  Kinds of databases to be mined  Kinds of knowledge to be discovered  Kinds of techniques utilized  Kinds of applications adapted R.Deepa IT Data Mining: Concepts and techniques
  • 20. MAJOR ISSUES IN DATA MINING  Mining methodology and user interaction  Mining different kinds of knowledge in databases  Interactive mining of Knowledge at multiple levels of abstraction  Incorporation of background knowledge  Data mining query languages and ad-hoc data mining  Expression and visualization of data mining results  Handling noise and incomplete data  Pattern evaluation: the interestingness problem  Performance and scalability  Efficiency and scalability of data mining algorithms  Parallel, distributed and incremental mining methods R.Deepa ITData Mining: Concepts and techniques
  • 21.  Issues relating to the diversity of data types  Handling relational and complex types of data  Mining information from heterogeneous databases and global information systems(WWW)  Issues related to applications and social impacts  Application of discovered knowledge • Domain-specific data mining tools • Intelligent query answering • Process control and decision making  Integration of the discovered knowledge with existing knowledge: A knowledge fusion problem  Protection of data security, integrity, and privacy. R.Deepa ITData Mining: Concepts and techniques
  • 22. SUMMARY  Data Mining: discovering interesting patterns from large amounts of data  A natural evolution of databases technology, in great demand , with wide applications  A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation  Mining can be performed in a variety of information repositories  Data mining functionalities: Characterization, Discrimination, Association, Classification, Clustering, Outlier and Trend analysis, etc.  Classification of Data mining systems  Major issues in Data Mining. R.Deepa IT