2. What is data mining ?
Why data mining ?
Data mining as a necessity
Evolution of database
Origin of data mining
Data mining : A KDD process
Applications
Management areas
Examples
techniques
3. Extration of implicit, previously unknown and
potentially useful information from data
Exploration & analysis by automatic and semi
automatic means of large quantities of data
in order to discover meaningful patterns
4. Lots of data is being collected and
warehoused
Web data , e commerce
Purchases at departmental store/ groceries
store
Bank/credit card transactions
Computers have become cheaper and more
powerful
Competitve pressure is strong
Provide better,customized services for an
edge (e.g in customer relationship
management)
6. DATA EXPLOSION PROBLEM
Automated data collection tools and mature
database technology lead to tremendous
amounts of data stored in databases, data
warehouses and other information
repositories
We are drowning in data, but starving for
knowledge!
Solution : data mining
Extraction of interesting knowledge(rules,
regularities,patterns,constraints) from data
in large databases
7. 1960s:
Data collection, database creation, IMS and
network DBMS.
1970s:
Relational data model, relational DBMS
implementation
1980s:
RDBMS, advanced data models( extended
relational, OO, deductive sets) and application-
oriented DBMS(spatial,scientific,engineering etc)
1990s-2000s:
Data mining and data warehousing, multimedia
databases, and web databases
8. The term “data mining” was introduced in the
1990s. Data mining roots are traced back along
three family lines:
Classical
statistics
Artificial
intelligence
Machine
learning
9. Statistical are the foundations of most technologies on
which data mining is built, e.g. regression analysis,
standard deviation etc. All these are used to study data
and data relationships.
Artificial intelligence which is built upon heuristics as
opposed to statistics, attempts to apply human-
thoughts like processing to statistical problems. E.g.
RDBMS.
Machine learning is to union of statistics and AI.
DATA MINING therefore uses AI and statistical approach
together. It blends AI heuristics with advanced
statistical analysis to study data and find previously –
hidden trends or patterns within company using
statistical fundamental concepts and adding more
advanced AI algorithms to achieve the goal.
10.
11. Database analysis and decision support
Market analysis and management
Target marketing, customer relation
management, market basket analysis, cross
selling, market segmentation
Risk analysis and management
Forecasting, customer retention,improved
underwriting, quality control,competitive
analysis
Fraud detection and management
Other applications
Text mining(news group,email,documents) and
web analysis
Intelligent query answering
12. Sports
IBM Advanced Scout analyzed NBA game statistics to
gain competitive advantage for New York Knicks and
Miami Heat
Astronomy
JPL and the Palomar Observatory discovered 22
quasars with the help of data-mining.
Internet Web Surf-Aid
IBM Surf-Aid applies data mining algorithms to web
access logs for market-related pages to discover
customer preference and behaviour pages, analyzing
effectiveness of web marketing, improving web site
organizations etc
13. Cross-market analysis
Associations/Co-relations between product sales
Prediction based on the associations information
Customer profiling
Data mining can tell you what types of customers
buy what products.
Identifying customer requirements
Identifying the best products for different
customers.
Use prediction to find what factors will attract new
customers.
Provides summary information
Various multidimensional summary reports
Statistical summary information
14. Finance planning and asset evaluation
i. Cash flow analysis and prediction
ii. Contingent claim analysis to evaluate assets
iii. Cross-sectional and time series analysis
Resources planning
i. Summarize and compare the resources and
spending
Competition
i. Monitor competition and market directions
ii. Set price strategy in the market
iii. Grouping of customer into classes
15. Applications
Widely used in health care, retail, credit card
services, telecommunications etc.
Approach
Use historical data to build models of fraudulent
behaviour and use data mining to help identify
similar instances.
Examples
Auto insurance : detect a group of people who
stage accidents to collect on insurance.
Money laundering : detect suspicious money
transactions