SlideShare una empresa de Scribd logo
1 de 17
Data Mining
www.StudsPlanet.com
Agenda
 What is Data Mining?
 Data Mining Tasks
 Challenges in Data mining
www.StudsPlanet.com
What is Data Mining
 Data mining is integral part of knowledge
discovery in databases (KDD), which is the
overall process of converting raw data into
useful information. This process consists of
series of transformation steps from
preprocessing to postprocessing of data
mining results
www.StudsPlanet.com
Process of Knowledge
Discovery in Database(KDD)
Data
Preprocessing
Data Mining PostProcessing
Normalization.
Data subsetting
Normalization.
Data subsetting
Filtering
Patterns,Visualization,
Pattern Interpretation
Filtering
Patterns,Visualization,
Pattern Interpretation
Inputdata
Input
Data Information
www.StudsPlanet.com
Data Mining Tasks
 Data Mining is generally divided into two
tasks.
1. Predictive tasks
2. Descriptive tasks
www.StudsPlanet.com
Predictive Tasks
 Objective: Predict the value of a specific
attribute (target/dependent variable)based
on the value of other attributes
(explanatory).
Example: Judge if a patient has specific
disease based on his/her medical tests results.
www.StudsPlanet.com
Descriptive Tasks
 Objective: To derive patterns
(correlation,trends,trajectories) that
summarizes the underlying relationship
between data.
Example: Identifying web pages that are
accessed together.(human interpretable
pattern)
www.StudsPlanet.com
Data Mining Tasks [contd.]
 Classification [Predictive]
 Clustering [Descriptive]
 Association Rule Discovery[Descriptive]
 Sequential Pattern Discovery [Descriptive]
 Regression [Predictive]
 Deviation Detection [Predictive]
www.StudsPlanet.com
Classification: Definition
 Classification: Given a collection of records
 Each record contains a set of attributes, one of the
attribute is a class.
 Find a model for class attribute as a function of
values of other attributes.
 Goal: previously unseen records should be
assigned a class as accurately as possible.
 A test set is used to determine the accuracy of the model.
Usually, the given data set is divided into training and
test sets, with training set used to build the model and
test set used to validate it.www.StudsPlanet.com
Classification: Example
 Direct Marketing
Goal: Reduce cost of mailing by targeting a set of
consumers likely to buy a new product.
 Approach:
 Use the data for a similar product introduced before.
 We know which customers decided to buy and which decided
otherwise. This {buy, don’t buy} decision forms the class
attribute.
 Collect various demographic, lifestyle, and company-interaction
related information about all such customers.
 Type of business, where they stay, how much they earn, etc.
 Use this information as input attributes to learn a classifier
model. (from Berry & Linoff, 1997)
www.StudsPlanet.com
Clustering: Definition
 Given a set of data points, each having a set
of attributes, and a similarity measure among
them, find clusters such that
 Data points in one cluster are more similar to one
another.
 Data points in separate clusters are less similar to
one another.
www.StudsPlanet.com
Clustering: Example
 Document Clustering:
 Goal: To find groups of documents that are similar to
each other based on the important terms appearing in
them.
 Approach: To identify frequently occurring terms in
each document. Form a similarity measure based on the
frequencies of different terms. Use it to cluster.
 Gain: Information Retrieval can utilize the clusters to
relate a new document or search term to clustered
documents.
www.StudsPlanet.com
Illustrating Document Clustering
Category Total
Articles
Correctly Placed
Financial 555 364
Foreign 341 260
National 273 36
Metro 943 746
Sports 738 573
Entertainment 354 278
Clustering Points: 3204 Articles Of Los Angles Times.
Similarity Measure: How Many words are common in these
documents. (after some word filtering) (Introduction to Data mining 2007)
www.StudsPlanet.com
Association Rule Discovery:
Definition
Given a set of records each of which contain some number of items
from a given collection;
Apriori principle: If an item set is frequent then its subset is also
frequent
TID Items
1 Bread, Coke Milk
2
3
Beer, Bread
Beer,Coke, Diaper, Milk
4 Beer, Bread, Diaper,
Milk
5 Coke, Diaper, Milk
Rule Discovered:
Milk -> Coke
Diaper, Milk -> Beer
www.StudsPlanet.com
Other Mining Tasks in Nutshell
 Sequential Pattern Discovery
In point-of-sale transaction sequences,
 Computer Bookstore:
(Intro_To_Visual_C) (C++_Primer) -->
(Perl_for_dummies,Tcl_Tk)
 Regression: Neural Networks
 Deviation Detection: Detect deviation from normal
behavior. Eg. Credit card fraud.
www.StudsPlanet.com
Challenges of Data Mining
 Scalability
 Dimensionality
 Complex and Heterogeneous Data
 Data Quality
 Data Ownership and Distribution
 Privacy Preservation
 Streaming Data
www.StudsPlanet.com
References
 Tan, P., Steinbach, M., & Kumar, V.,
Introduction to Data Mining. Addison
Wesley, 2006.
www.StudsPlanet.com

Más contenido relacionado

La actualidad más candente

Datamining
DataminingDatamining
Dataminingsumit621
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data miningDevakumar Jain
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CSThanveen
 
Datamining - On What Kind of Data
Datamining - On What Kind of DataDatamining - On What Kind of Data
Datamining - On What Kind of Datawina wulansari
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
Data Mining Primitives, Languages & Systems
Data Mining Primitives, Languages & SystemsData Mining Primitives, Languages & Systems
Data Mining Primitives, Languages & SystemsNiloy Sikder
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesRajendran
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process Shuvra Ghosh
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 
Data mining
Data mining Data mining
Data mining AthiraR23
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
Data Mining methodology
 Data Mining methodology  Data Mining methodology
Data Mining methodology rebeccatho
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysisDataminingTools Inc
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 

La actualidad más candente (20)

Datamining
DataminingDatamining
Datamining
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CS
 
Datamining - On What Kind of Data
Datamining - On What Kind of DataDatamining - On What Kind of Data
Datamining - On What Kind of Data
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Data Mining Primitives, Languages & Systems
Data Mining Primitives, Languages & SystemsData Mining Primitives, Languages & Systems
Data Mining Primitives, Languages & Systems
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Database
DatabaseDatabase
Database
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Lecture1
Lecture1Lecture1
Lecture1
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data mining Data mining
Data mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining Tasks
 
2 Data-mining process
2   Data-mining process2   Data-mining process
2 Data-mining process
 
Lecture - Data Mining
Lecture - Data MiningLecture - Data Mining
Lecture - Data Mining
 
Data Mining methodology
 Data Mining methodology  Data Mining methodology
Data Mining methodology
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 

Similar a Data mining

Similar a Data mining (20)

data mining presentation power point for the study
data mining presentation power point for the studydata mining presentation power point for the study
data mining presentation power point for the study
 
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.pptlect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
 
lect1.ppt
lect1.pptlect1.ppt
lect1.ppt
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
 
Data mining
Data miningData mining
Data mining
 
Data-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxData-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptx
 
Data-Mining-ppt.pptx
Data-Mining-ppt.pptxData-Mining-ppt.pptx
Data-Mining-ppt.pptx
 
data.2.pptx
data.2.pptxdata.2.pptx
data.2.pptx
 
Talk
TalkTalk
Talk
 
D M1
D M1D M1
D M1
 
Testing
TestingTesting
Testing
 
Testing
TestingTesting
Testing
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
 

Más de Nits Kedia

Sugar industry
Sugar industrySugar industry
Sugar industryNits Kedia
 
Nokia international product life cycle 1
Nokia international product life cycle 1Nokia international product life cycle 1
Nokia international product life cycle 1Nits Kedia
 
Nokia the morph concept
Nokia  the morph conceptNokia  the morph concept
Nokia the morph conceptNits Kedia
 
Meaning & nature of resources
Meaning & nature of resourcesMeaning & nature of resources
Meaning & nature of resourcesNits Kedia
 
Leadership lesson from india
Leadership lesson from indiaLeadership lesson from india
Leadership lesson from indiaNits Kedia
 
Leadership across culture
Leadership across cultureLeadership across culture
Leadership across cultureNits Kedia
 
Labout and employmenr discimination law
Labout and employmenr discimination lawLabout and employmenr discimination law
Labout and employmenr discimination lawNits Kedia
 
International law and wto
International law and wtoInternational law and wto
International law and wtoNits Kedia
 
Intellectual property rights (2)
Intellectual property rights (2)Intellectual property rights (2)
Intellectual property rights (2)Nits Kedia
 
India's 5 year plan startegy
India's 5 year plan startegyIndia's 5 year plan startegy
India's 5 year plan startegyNits Kedia
 
Import clearance procedure
Import clearance procedureImport clearance procedure
Import clearance procedureNits Kedia
 
Human environment
Human environmentHuman environment
Human environmentNits Kedia
 
Globaliation p point
Globaliation p pointGlobaliation p point
Globaliation p pointNits Kedia
 
Foreign corrupt practises act(fcpa)
Foreign corrupt practises act(fcpa)Foreign corrupt practises act(fcpa)
Foreign corrupt practises act(fcpa)Nits Kedia
 
Financial services
Financial servicesFinancial services
Financial servicesNits Kedia
 
Financial services 1
Financial services 1Financial services 1
Financial services 1Nits Kedia
 

Más de Nits Kedia (20)

Wto
WtoWto
Wto
 
Trips
TripsTrips
Trips
 
Sugar industry
Sugar industrySugar industry
Sugar industry
 
Nokia international product life cycle 1
Nokia international product life cycle 1Nokia international product life cycle 1
Nokia international product life cycle 1
 
Nokia the morph concept
Nokia  the morph conceptNokia  the morph concept
Nokia the morph concept
 
Meaning & nature of resources
Meaning & nature of resourcesMeaning & nature of resources
Meaning & nature of resources
 
Leadership lesson from india
Leadership lesson from indiaLeadership lesson from india
Leadership lesson from india
 
Leadership across culture
Leadership across cultureLeadership across culture
Leadership across culture
 
Labout and employmenr discimination law
Labout and employmenr discimination lawLabout and employmenr discimination law
Labout and employmenr discimination law
 
International law and wto
International law and wtoInternational law and wto
International law and wto
 
Intellectual property rights (2)
Intellectual property rights (2)Intellectual property rights (2)
Intellectual property rights (2)
 
India's 5 year plan startegy
India's 5 year plan startegyIndia's 5 year plan startegy
India's 5 year plan startegy
 
Import clearance procedure
Import clearance procedureImport clearance procedure
Import clearance procedure
 
Ifm intro
Ifm intro Ifm intro
Ifm intro
 
Human environment
Human environmentHuman environment
Human environment
 
Globaliation p point
Globaliation p pointGlobaliation p point
Globaliation p point
 
Foreign corrupt practises act(fcpa)
Foreign corrupt practises act(fcpa)Foreign corrupt practises act(fcpa)
Foreign corrupt practises act(fcpa)
 
Financial services
Financial servicesFinancial services
Financial services
 
Financial services 1
Financial services 1Financial services 1
Financial services 1
 
Fdi
FdiFdi
Fdi
 

Último

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Último (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Data mining

  • 2. Agenda  What is Data Mining?  Data Mining Tasks  Challenges in Data mining www.StudsPlanet.com
  • 3. What is Data Mining  Data mining is integral part of knowledge discovery in databases (KDD), which is the overall process of converting raw data into useful information. This process consists of series of transformation steps from preprocessing to postprocessing of data mining results www.StudsPlanet.com
  • 4. Process of Knowledge Discovery in Database(KDD) Data Preprocessing Data Mining PostProcessing Normalization. Data subsetting Normalization. Data subsetting Filtering Patterns,Visualization, Pattern Interpretation Filtering Patterns,Visualization, Pattern Interpretation Inputdata Input Data Information www.StudsPlanet.com
  • 5. Data Mining Tasks  Data Mining is generally divided into two tasks. 1. Predictive tasks 2. Descriptive tasks www.StudsPlanet.com
  • 6. Predictive Tasks  Objective: Predict the value of a specific attribute (target/dependent variable)based on the value of other attributes (explanatory). Example: Judge if a patient has specific disease based on his/her medical tests results. www.StudsPlanet.com
  • 7. Descriptive Tasks  Objective: To derive patterns (correlation,trends,trajectories) that summarizes the underlying relationship between data. Example: Identifying web pages that are accessed together.(human interpretable pattern) www.StudsPlanet.com
  • 8. Data Mining Tasks [contd.]  Classification [Predictive]  Clustering [Descriptive]  Association Rule Discovery[Descriptive]  Sequential Pattern Discovery [Descriptive]  Regression [Predictive]  Deviation Detection [Predictive] www.StudsPlanet.com
  • 9. Classification: Definition  Classification: Given a collection of records  Each record contains a set of attributes, one of the attribute is a class.  Find a model for class attribute as a function of values of other attributes.  Goal: previously unseen records should be assigned a class as accurately as possible.  A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.www.StudsPlanet.com
  • 10. Classification: Example  Direct Marketing Goal: Reduce cost of mailing by targeting a set of consumers likely to buy a new product.  Approach:  Use the data for a similar product introduced before.  We know which customers decided to buy and which decided otherwise. This {buy, don’t buy} decision forms the class attribute.  Collect various demographic, lifestyle, and company-interaction related information about all such customers.  Type of business, where they stay, how much they earn, etc.  Use this information as input attributes to learn a classifier model. (from Berry & Linoff, 1997) www.StudsPlanet.com
  • 11. Clustering: Definition  Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that  Data points in one cluster are more similar to one another.  Data points in separate clusters are less similar to one another. www.StudsPlanet.com
  • 12. Clustering: Example  Document Clustering:  Goal: To find groups of documents that are similar to each other based on the important terms appearing in them.  Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster.  Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents. www.StudsPlanet.com
  • 13. Illustrating Document Clustering Category Total Articles Correctly Placed Financial 555 364 Foreign 341 260 National 273 36 Metro 943 746 Sports 738 573 Entertainment 354 278 Clustering Points: 3204 Articles Of Los Angles Times. Similarity Measure: How Many words are common in these documents. (after some word filtering) (Introduction to Data mining 2007) www.StudsPlanet.com
  • 14. Association Rule Discovery: Definition Given a set of records each of which contain some number of items from a given collection; Apriori principle: If an item set is frequent then its subset is also frequent TID Items 1 Bread, Coke Milk 2 3 Beer, Bread Beer,Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Diaper, Milk Rule Discovered: Milk -> Coke Diaper, Milk -> Beer www.StudsPlanet.com
  • 15. Other Mining Tasks in Nutshell  Sequential Pattern Discovery In point-of-sale transaction sequences,  Computer Bookstore: (Intro_To_Visual_C) (C++_Primer) --> (Perl_for_dummies,Tcl_Tk)  Regression: Neural Networks  Deviation Detection: Detect deviation from normal behavior. Eg. Credit card fraud. www.StudsPlanet.com
  • 16. Challenges of Data Mining  Scalability  Dimensionality  Complex and Heterogeneous Data  Data Quality  Data Ownership and Distribution  Privacy Preservation  Streaming Data www.StudsPlanet.com
  • 17. References  Tan, P., Steinbach, M., & Kumar, V., Introduction to Data Mining. Addison Wesley, 2006. www.StudsPlanet.com

Notas del editor

  1. .