SlideShare una empresa de Scribd logo
1 de 3
Descargar para leer sin conexión
Data Mining Lecture 1
Why Data Mining?
Negative Points
 All sounds complicated. Why should I learn
about Data Mining?
 What's wrong with just relational
databases? Why would I want to go through
these extra complicated steps?
 Isn't it expensive? It sounds like it takes a
lot of skills, programming, computational
time and storage space. Where's the
benefit?
Positive Points
 Data Mining is not just a cute academic
exercise
 It has very profitable real world uses
 Practically all large companies and
manygovernments perform data mining as
part of their planning and analysis
Data Explosion
Data explosion problem
 Automated data collection tools and
mature database technology lead to
tremendous amounts of data stored in
databases, data warehouses and other
information repositories We are drowning
in data, but starving for knowledge!
Solution: Data Mining
 Extraction of interesting knowledge (rules,
regularities, patterns, constraints) from
large datasets
What is Data Mining?
Data mining (knowledge discovery in databases):
 Extraction of interesting (non-trivial,
implicit, previously unknown
and potentially useful) information or patterns from
data in large
databases
Alternative names and their “inside stories”:
 Data mining: a misnomer?
 Knowledge discovery (mining) in databases
(KDD), knowledge extraction, data/pattern
analysis, data archaeology, data dredging,
information harvesting, business
intelligence, etc.
What is not data mining?
 (Deductive) query processing. ● Expert
systems or small ML/statistical programs
Potential Applications
Database analysis and decision support
 Market analysis and management
o target marketing, customer relation
management, market basket
analysis, cross selling, market
segmentation
 Risk analysis and management
o Forecasting, customer retention,
improved underwriting, quality
control, competitive analysis
 Fraud detection and management
Other Applications
 Text mining (news group, email,
documents) and Web analysis
 Intelligent query answering
Market Analysis
Where are the data sources for analysis?
 Credit card transactions, loyalty cards,
discount coupons, customer complaint
calls, plus (public) lifestyle studies
Target marketing
 Find clusters or “model” customers who
share the same characteristics: interest,
income level, spending habits, etc.
Determine customer purchasing patterns over time
 Conversion of single to a joint bank
account: marriage, etc.
Cross-market analysis
 Associations/correlations between product
sales
 Prediction based on the association
information
Corporations
Finance planning and asset evaluation
 cash flow analysis and prediction
 contingent claim analysis to evaluate assets
 cross-sectional and time series analysis
(financialratio, trend analysis, etc.)
Resource planning
 summarize and compare the resources and
spending
Competition
 monitor competitors and market directions
 group customers into classes and a class-
based pricing procedure
 set pricing strategy in a highly competitive
market
Market Management
Customer profiling
 data mining can tell you what types of
customers buy what products (clustering or
classification)
Identifying customer requirements
 identifying the best products for different
customers
 use prediction to find what factors will
attract new customers
Provides summary information
 various multidimensional summary reports
 statistical summary information (data
central tendency and variation)
Fraud Detection
Applications
 widely used in health care, retail, credit
card services, telecommunications (phone
card fraud), etc.
Approach
 use historical data to build models of
fraudulent behaviour and use data mining
to help identify similar instances
Examples
 auto insurance: detect a group of people
who stage accidents to collect on insurance
 money laundering: detect suspicious
money transactions (US Treasury's Financial
Crimes Enforcement Network)
 medical insurance: detect professional
patients and ring of doctors and ring of
references
Detecting inappropriate medical treatment
 Australian Health Insurance Commission
identifies that in many cases blanket
screening tests were requested (save
Australian €1m/yr)
Detecting telephone fraud
 Telephone call model: destination of the
call, duration, time of day or week. Analyse
patterns that deviate from an expected
norm
 British Telecom identified discrete groups
of callers with frequent intra-group calls,
especially mobile phones, and broke a
multimillion dollar fraud
Retail
 Analysts estimate that 38% of retail shrink is
due to dishonest employees
Other Applications
Sports
 IBM Advanced Scout analysed NBA game
statistics (shots blocked, assists, and fouls)
to gain competitive advantage for New York
Knicks and Miami Heat
Astronomy
 JPL and the Palomar Observatory
discovered 22 quasars with the help of data
mining
Internet Web Surf-Aid
 IBM Surf-Aid applies data mining algorithms
to Web access logs for market-related
pages to discover customer preference and
behaviour pages, analysing effectiveness of
Web marketing, improving Web site
organisation, etc.
KDD Process
Learning the application domain
 relevant prior knowledge and goals of
application
Creating a target data set: data selection
Data cleaning and pre-processing (may take 60% of
effort!)
Data reduction and transformation
 Find useful features,
dimensionality/variable reduction, invariant
representation
Choosing functions of data mining
 summarisation, classification, regression,
association, clustering
Choosing the mining algorithms
Data mining: search for patterns of interest
Pattern evaluation and knowledge presentation
 visualisation, transformation, removing
redundant patterns, etc.
Use of discovered knowledge
Business Intelligence
Architecture of DM System
DM Software Tools
DM Software Tools
 We can find several Data Mining tools
 Commercial and free (open source)
software tools
 NeuroOnline, ARMiner, Tminer, Bayda,
Cluto, YALE,Rainbow, MDR, C5, Clementine,
etc.
RapidMiner
 Free version
 Commercial version

Más contenido relacionado

La actualidad más candente

Customer Segmentation Project
Customer Segmentation ProjectCustomer Segmentation Project
Customer Segmentation ProjectAditya Ekawade
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data miningDevakumar Jain
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.pptneelamoberoi1030
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingPrithwis Mukerjee
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentationDavid Raj Kanthi
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data Shallote Dsouza
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learningSushant Shrivastava
 
Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design phanleson
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
 

La actualidad más candente (20)

Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Customer Segmentation Project
Customer Segmentation ProjectCustomer Segmentation Project
Customer Segmentation Project
 
Introduction data mining
Introduction data miningIntroduction data mining
Introduction data mining
 
Customer Segmentation
Customer SegmentationCustomer Segmentation
Customer Segmentation
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Introduction
IntroductionIntroduction
Introduction
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
Data warehousing unit 1
Data warehousing unit 1Data warehousing unit 1
Data warehousing unit 1
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data
 
Data Mining Technique - SEMMA
Data Mining Technique - SEMMAData Mining Technique - SEMMA
Data Mining Technique - SEMMA
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design Lecture 03 - The Data Warehouse and Design
Lecture 03 - The Data Warehouse and Design
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 

Similar a Data mining 1 - Introduction (cheat sheet - printable)

Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data miningRohit Kumar
 
Data mining final year project in ludhiana
Data mining final year project in ludhianaData mining final year project in ludhiana
Data mining final year project in ludhianadeepikakaler1
 
Data mining final year project in jalandhar
Data mining final year project in jalandharData mining final year project in jalandhar
Data mining final year project in jalandhardeepikakaler1
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Miningdataminers.ir
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
datamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptxdatamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptxshyam1985
 
Information Technology Data Mining
Information Technology Data MiningInformation Technology Data Mining
Information Technology Data Miningsamiksha sharma
 
6 weeks summer training in data mining,ludhiana
6 weeks summer training in data mining,ludhiana6 weeks summer training in data mining,ludhiana
6 weeks summer training in data mining,ludhianadeepikakaler1
 
6months industrial training in data mining,ludhiana
6months industrial training in data mining,ludhiana6months industrial training in data mining,ludhiana
6months industrial training in data mining,ludhianadeepikakaler1
 
6 weeks summer training in data mining,jalandhar
6 weeks summer training in data mining,jalandhar6 weeks summer training in data mining,jalandhar
6 weeks summer training in data mining,jalandhardeepikakaler1
 
6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhar6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhardeepikakaler1
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryYoung Alista
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryHarry Potter
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryJames Wong
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryFraboni Ec
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryLuis Goldster
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryTony Nguyen
 

Similar a Data mining 1 - Introduction (cheat sheet - printable) (20)

Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Data mining final year project in ludhiana
Data mining final year project in ludhianaData mining final year project in ludhiana
Data mining final year project in ludhiana
 
Data mining final year project in jalandhar
Data mining final year project in jalandharData mining final year project in jalandhar
Data mining final year project in jalandhar
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
 
datamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptxdatamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptx
 
Information Technology Data Mining
Information Technology Data MiningInformation Technology Data Mining
Information Technology Data Mining
 
6 weeks summer training in data mining,ludhiana
6 weeks summer training in data mining,ludhiana6 weeks summer training in data mining,ludhiana
6 weeks summer training in data mining,ludhiana
 
6months industrial training in data mining,ludhiana
6months industrial training in data mining,ludhiana6months industrial training in data mining,ludhiana
6months industrial training in data mining,ludhiana
 
6 weeks summer training in data mining,jalandhar
6 weeks summer training in data mining,jalandhar6 weeks summer training in data mining,jalandhar
6 weeks summer training in data mining,jalandhar
 
6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhar6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhar
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 

Último

5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...CaraSkikne1
 
Work Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sashaWork Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sashasashalaycock03
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...Nguyen Thanh Tu Collection
 
A gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceA gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceApostolos Syropoulos
 
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Dr. Asif Anas
 
How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17Celine George
 
3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptx3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptxmary850239
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesCeline George
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 
Optical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxOptical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxPurva Nikam
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfMohonDas
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxiammrhaywood
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxraviapr7
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17Celine George
 
Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....Riddhi Kevadiya
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxraviapr7
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINT
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINTARTICULAR DISC OF TEMPOROMANDIBULAR JOINT
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINTDR. SNEHA NAIR
 

Último (20)

5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...
 
Work Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sashaWork Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sasha
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
 
A gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceA gentle introduction to Artificial Intelligence
A gentle introduction to Artificial Intelligence
 
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
Unveiling the Intricacies of Leishmania donovani: Structure, Life Cycle, Path...
 
How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17
 
3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptx3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptx
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 Sales
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 
Optical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxOptical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptx
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdf
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17
 
Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptx
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINT
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINTARTICULAR DISC OF TEMPOROMANDIBULAR JOINT
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINT
 

Data mining 1 - Introduction (cheat sheet - printable)

  • 1. Data Mining Lecture 1 Why Data Mining? Negative Points  All sounds complicated. Why should I learn about Data Mining?  What's wrong with just relational databases? Why would I want to go through these extra complicated steps?  Isn't it expensive? It sounds like it takes a lot of skills, programming, computational time and storage space. Where's the benefit? Positive Points  Data Mining is not just a cute academic exercise  It has very profitable real world uses  Practically all large companies and manygovernments perform data mining as part of their planning and analysis Data Explosion Data explosion problem  Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories We are drowning in data, but starving for knowledge! Solution: Data Mining  Extraction of interesting knowledge (rules, regularities, patterns, constraints) from large datasets What is Data Mining? Data mining (knowledge discovery in databases):  Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases Alternative names and their “inside stories”:  Data mining: a misnomer?  Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archaeology, data dredging, information harvesting, business intelligence, etc. What is not data mining?  (Deductive) query processing. ● Expert systems or small ML/statistical programs Potential Applications Database analysis and decision support  Market analysis and management o target marketing, customer relation management, market basket analysis, cross selling, market segmentation  Risk analysis and management o Forecasting, customer retention, improved underwriting, quality control, competitive analysis  Fraud detection and management Other Applications  Text mining (news group, email, documents) and Web analysis  Intelligent query answering Market Analysis Where are the data sources for analysis?  Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies Target marketing  Find clusters or “model” customers who share the same characteristics: interest, income level, spending habits, etc. Determine customer purchasing patterns over time  Conversion of single to a joint bank account: marriage, etc. Cross-market analysis  Associations/correlations between product sales  Prediction based on the association information
  • 2. Corporations Finance planning and asset evaluation  cash flow analysis and prediction  contingent claim analysis to evaluate assets  cross-sectional and time series analysis (financialratio, trend analysis, etc.) Resource planning  summarize and compare the resources and spending Competition  monitor competitors and market directions  group customers into classes and a class- based pricing procedure  set pricing strategy in a highly competitive market Market Management Customer profiling  data mining can tell you what types of customers buy what products (clustering or classification) Identifying customer requirements  identifying the best products for different customers  use prediction to find what factors will attract new customers Provides summary information  various multidimensional summary reports  statistical summary information (data central tendency and variation) Fraud Detection Applications  widely used in health care, retail, credit card services, telecommunications (phone card fraud), etc. Approach  use historical data to build models of fraudulent behaviour and use data mining to help identify similar instances Examples  auto insurance: detect a group of people who stage accidents to collect on insurance  money laundering: detect suspicious money transactions (US Treasury's Financial Crimes Enforcement Network)  medical insurance: detect professional patients and ring of doctors and ring of references Detecting inappropriate medical treatment  Australian Health Insurance Commission identifies that in many cases blanket screening tests were requested (save Australian €1m/yr) Detecting telephone fraud  Telephone call model: destination of the call, duration, time of day or week. Analyse patterns that deviate from an expected norm  British Telecom identified discrete groups of callers with frequent intra-group calls, especially mobile phones, and broke a multimillion dollar fraud Retail  Analysts estimate that 38% of retail shrink is due to dishonest employees Other Applications Sports  IBM Advanced Scout analysed NBA game statistics (shots blocked, assists, and fouls) to gain competitive advantage for New York Knicks and Miami Heat Astronomy  JPL and the Palomar Observatory discovered 22 quasars with the help of data mining Internet Web Surf-Aid  IBM Surf-Aid applies data mining algorithms to Web access logs for market-related pages to discover customer preference and behaviour pages, analysing effectiveness of Web marketing, improving Web site organisation, etc.
  • 3. KDD Process Learning the application domain  relevant prior knowledge and goals of application Creating a target data set: data selection Data cleaning and pre-processing (may take 60% of effort!) Data reduction and transformation  Find useful features, dimensionality/variable reduction, invariant representation Choosing functions of data mining  summarisation, classification, regression, association, clustering Choosing the mining algorithms Data mining: search for patterns of interest Pattern evaluation and knowledge presentation  visualisation, transformation, removing redundant patterns, etc. Use of discovered knowledge Business Intelligence Architecture of DM System DM Software Tools DM Software Tools  We can find several Data Mining tools  Commercial and free (open source) software tools  NeuroOnline, ARMiner, Tminer, Bayda, Cluto, YALE,Rainbow, MDR, C5, Clementine, etc. RapidMiner  Free version  Commercial version