Data mining-basic

Gufran Ahmad
Contents
 What is Data Mining?
 Data Mining / KDD process
 Different aspects of Data Mining
 Why Data Mining?
 Data Mining in Business
 Examples
 How to do Data Mining?
What is Data Mining ?
 Taking out interesting (significant, unusual,
hidden, useful, helpful) patterns or
knowledge from large databases (giga-,
tera-byte or peta-byte data)
 Exploring and analyzing detailed business
transactions, i.e., "digging through tons of
data" to uncover patterns and relationships
that contained within the business activity
and history
 Example: to better understand customer
behavior and preferences, market prediction in
future, business decision making, trend analysis
Knowledge Discovery in
Database Process
Different Aspects of Data
Mining
 Different names of the same thing (Data
Mining / Knowledge Discovery in
Database / Pattern Analysis / Business
Intelligence
 Applying Intelligence/cleverness into
Business brings Business Intelligence
 Investigating prototypes/models brings
Pattern Analysis
Why Data Mining?
 Exponential growth of business data:
mega- -> giga- -> tera- -> peta- bytes
 Human incapability to analyze large data
 Unable to predict/forecast what may
happen in the future
 Faster queries (OLAP) because of dealing
with reading data mainly
Why Data Mining?
 Flexibility to access multiple and
heterogeneous databases, stores
 Multidimensional computation easier, in
addition to significant data mining
algorithmic computations
 Example areas: Business, Sciences,
Social and Technical databases
Data Mining in Business
Examples of What People
are doing with Data
Mining:
Fraud/Non-Compliance
Anomaly detection
 Isolate the factors that lead
to fraud, waste and abuse
 Target auditing and
investigative efforts more
effectively
Credit/Risk Scoring
Intrusion detection
Parts failure prediction
Recruiting/Attracting
customers
Maximizing
profitability (cross
selling, identifying
profitable customers)
Service Delivery and
Customer Retention
 Build profiles of
customers likely to
use which services
Web Mining
How to mine the data?
 Applying KDD processes, using DM algorithms
and Data Mining Tools
 Multidimensional / Cube Modeling of multidimensional
data-sets (considering every table as a dimension)
 Association and Co-relation Analysis of frequent
patterns (finding out relations among data)
 Classification and label prediction (putting new data to
known groups as per attributes/characteristics)
Supervised learning
 Cluster Analysis (discovering groups in data as per
likeness/similarity) Unsupervised learning
How to mine the data?
 Outlier Analysis (discovering abnormal data
among data-sets)
 Time dependent Sequential Analysis (finding
out time variant series and its tendency)
 Graph and network Analysis (looking for
visualized/hidden nodes and links in
networks/graphs)
Data Mining Applications
 Data mining in CRM Customer Relationship Management
applications
 can contribute significantly to the bottom line (Rather than randomly
contacting a prospect or customer through a call center or sending
mail, a company can concentrate its efforts on prospects that are
predicted to have a high likelihood of responding to an offer)
 More sophisticated methods may be used to optimize resources across
campaigns so that one may predict which channel and which offer an
individual is most likely to respond to—across all potential offers.
 Additionally, sophisticated applications could be used to automate the
mailing. Once the results from data mining (potential
prospect/customer and channel/offer) are determined, this
"sophisticated application" can either automatically send an e-mail or
regular mail. Finally, in cases where many people will take an action
without an offer, uplift modeling can be used to determine which people
will have the greatest increase in responding if given an offer.
 Data Clustering can also be used to automatically discover the
segments or groups within a customer data set.
Data Mining Applications
 Businesses employing data mining may see a return on investment, but
also they recognize that the number of predictive models can quickly
become very large. Rather than one model to predict how many
customers will churn, a business could build a separate model for each
region and customer type. Then instead of sending an offer to all people
that are likely to churn, it may only want to send offers to customers. And
finally, it may also want to determine which customers are going to be
profitable over a window of time and only send the offers to those that
are likely to be profitable. In order to maintain this quantity of models,
they need to manage model versions and move to automated data
mining.
 Data mining can also be helpful to human-resources departments in
identifying the characteristics of their most successful employees.
Information obtained, such as universities attended by highly successful
employees, can help HR focus recruiting efforts accordingly. Additionally,
Strategic Enterprise Management applications help a company translate
corporate-level goals, such as profit and margin share targets, into
operational decisions, such as production plans and workforce levels.
Data Mining Applications
 Another example of data mining, often called the market basket analysis, relates to its use in retail
sales. If a clothing store records the purchases of customers, a data-mining system could identify
those customers who favors silk shirts over cotton ones. Although some explanations of
relationships may be difficult, taking advantage of it is easier. The example deals with association
rules within transaction-based data. Not all data are transaction based and logical or inexact rules
may also be present within a database. In a manufacturing application, an inexact rule may state
that 73% of products which have a specific defect or problem will develop a secondary problem
within the next six months.
 Market basket analysis has also been used to identify the purchase patterns of the Alpha consumer.
Alpha Consumers are people that play a key role in connecting with the concept behind a product,
then adopting that product, and finally validating it for the rest of society. Analyzing the data
collected on this type of users has allowed companies to predict future buying trends and forecast
supply demands.
 Data Mining is a highly effective tool in the catalog marketing industry. Catalogers have a rich
history of customer transactions on millions of customers dating back several years. Data mining
tools can identify patterns among customers and help identify the most likely customers to respond
to upcoming mailing campaigns.
 Related to an integrated-circuit production line, an example of data mining is described in the paper
"Mining IC Test Data to Optimize VLSI Testing."In this paper the application of data mining and
decision analysis to the problem of die-level functional test is described. Experiments mentioned in
this paper demonstrate the ability of applying a system of mining historical die-test data to create a
probabilistic model of patterns of die failure which are then utilized to decide in real time which die to
test next and when to stop testing. This system has been shown, based on experiments with
historical test data, to have the potential to improve profits on mature IC products.
1 de 14

Recomendados

Presentation data mining por
Presentation data miningPresentation data mining
Presentation data miningcegonsoft1999
388 vistas9 diapositivas
Visual and analytical mining of transactions data for production planning f... por
Visual and analytical mining of transactions data  for production planning  f...Visual and analytical mining of transactions data  for production planning  f...
Visual and analytical mining of transactions data for production planning f...ertekg
639 vistas13 diapositivas
Wrap up por
Wrap upWrap up
Wrap upJason Rodrigues
376 vistas52 diapositivas
Data mining in Telecommunications por
Data mining in TelecommunicationsData mining in Telecommunications
Data mining in TelecommunicationsMohsin Nadaf
21.2K vistas21 diapositivas
18231979 Data Mining por
18231979 Data Mining18231979 Data Mining
18231979 Data MiningRaghav agrawal
3.4K vistas38 diapositivas
Data Mining Xuequn Shang NorthWestern Polytechnical University por
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical Universitybutest
827 vistas55 diapositivas

Más contenido relacionado

La actualidad más candente

Photovoltaic market por
Photovoltaic marketPhotovoltaic market
Photovoltaic marketsagarkangude
59 vistas13 diapositivas
IRJET- Information Reterival of Text-based Deep Stock Prediction por
IRJET-  	  Information Reterival of Text-based Deep Stock PredictionIRJET-  	  Information Reterival of Text-based Deep Stock Prediction
IRJET- Information Reterival of Text-based Deep Stock PredictionIRJET Journal
3 vistas2 diapositivas
Basic Overview of Data Mining por
Basic Overview of Data MiningBasic Overview of Data Mining
Basic Overview of Data MiningSyracuse University
4.7K vistas20 diapositivas
Advanced analytics market por
Advanced analytics marketAdvanced analytics market
Advanced analytics marketpallavi_1234
92 vistas7 diapositivas
Using Ontology to Capture Supply Chain Code Halos por
Using Ontology to Capture Supply Chain Code HalosUsing Ontology to Capture Supply Chain Code Halos
Using Ontology to Capture Supply Chain Code HalosCognizant
3.4K vistas18 diapositivas
KM.doc por
KM.docKM.doc
KM.docbutest
394 vistas10 diapositivas

La actualidad más candente(19)

IRJET- Information Reterival of Text-based Deep Stock Prediction por IRJET Journal
IRJET-  	  Information Reterival of Text-based Deep Stock PredictionIRJET-  	  Information Reterival of Text-based Deep Stock Prediction
IRJET- Information Reterival of Text-based Deep Stock Prediction
IRJET Journal3 vistas
Advanced analytics market por pallavi_1234
Advanced analytics marketAdvanced analytics market
Advanced analytics market
pallavi_123492 vistas
Using Ontology to Capture Supply Chain Code Halos por Cognizant
Using Ontology to Capture Supply Chain Code HalosUsing Ontology to Capture Supply Chain Code Halos
Using Ontology to Capture Supply Chain Code Halos
Cognizant3.4K vistas
KM.doc por butest
KM.docKM.doc
KM.doc
butest394 vistas
Micro location technology market por archanamohol
Micro location technology marketMicro location technology market
Micro location technology market
archanamohol61 vistas
Презентация проекта ООО "Лаборатория Кинтех" por Ivan Zaev
Презентация проекта ООО "Лаборатория Кинтех"Презентация проекта ООО "Лаборатория Кинтех"
Презентация проекта ООО "Лаборатория Кинтех"
Ivan Zaev1.3K vistas
Supply chain analytics market vendors by share & growth strategies 2025... por DheerajPawar4
Supply chain analytics market vendors by share & growth strategies   2025...Supply chain analytics market vendors by share & growth strategies   2025...
Supply chain analytics market vendors by share & growth strategies 2025...
DheerajPawar424 vistas
Introduction To Data Mining por dataminers.ir
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
dataminers.ir1.2K vistas
Introduction to Data Mining por Sushil Kulkarni
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
Sushil Kulkarni5.7K vistas
Robotics and automation actuators marketRobotics and Automation Actuators Mar... por KiranDeshmukh42
Robotics and automation actuators marketRobotics and Automation Actuators Mar...Robotics and automation actuators marketRobotics and Automation Actuators Mar...
Robotics and automation actuators marketRobotics and Automation Actuators Mar...
KiranDeshmukh4225 vistas
Knowledge Discovery and Data Mining por Amritanshu Mehra
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
Amritanshu Mehra20.3K vistas
Access Control Market worth $12.8 billion by 2025 With CAGR Of 8.2% por KiranDeshmukh42
Access Control Market worth $12.8 billion by 2025 With CAGR Of 8.2%Access Control Market worth $12.8 billion by 2025 With CAGR Of 8.2%
Access Control Market worth $12.8 billion by 2025 With CAGR Of 8.2%
KiranDeshmukh4228 vistas
Testing por sankett
TestingTesting
Testing
sankett553 vistas

Similar a Data mining-basic

Data mining & data warehousing por
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousingShubha Brota Raha
3.8K vistas25 diapositivas
Classification and prediction in data mining por
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data miningEr. Nawaraj Bhandari
11.1K vistas27 diapositivas
Data mining por
Data miningData mining
Data mininghardavishah56
47 vistas26 diapositivas
Data Mining por
Data MiningData Mining
Data MiningAbinayaJawahar
9 vistas17 diapositivas
Machine learning por
Machine learningMachine learning
Machine learningRajib Kumar De
2.6K vistas20 diapositivas
Datamining por
DataminingDatamining
DataminingIssacArputharajJeyak
53 vistas31 diapositivas

Similar a Data mining-basic(20)

DataMining Techniq por Respa Peter
DataMining TechniqDataMining Techniq
DataMining Techniq
Respa Peter448 vistas
Using Data Mining Techniques in Customer Segmentation por IJERA Editor
Using Data Mining Techniques in Customer SegmentationUsing Data Mining Techniques in Customer Segmentation
Using Data Mining Techniques in Customer Segmentation
IJERA Editor545 vistas
Data Mining and Business Analytics by Seyed Ziae Mousavi Mojab por zmojab
Data Mining and Business Analytics by Seyed Ziae Mousavi MojabData Mining and Business Analytics by Seyed Ziae Mousavi Mojab
Data Mining and Business Analytics by Seyed Ziae Mousavi Mojab
zmojab201 vistas
Data analysis step by step guide por Manish Gupta
Data analysis   step by step guideData analysis   step by step guide
Data analysis step by step guide
Manish Gupta153 vistas
A Practical Approach To Data Mining Presentation por millerca2
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentation
millerca21.8K vistas
Information Technology Data Mining por samiksha sharma
Information Technology Data MiningInformation Technology Data Mining
Information Technology Data Mining
samiksha sharma2.5K vistas
AHP Based Data Mining for Customer Segmentation Based on Customer Lifetime Value por IIRindia
AHP Based Data Mining for Customer Segmentation Based on Customer Lifetime ValueAHP Based Data Mining for Customer Segmentation Based on Customer Lifetime Value
AHP Based Data Mining for Customer Segmentation Based on Customer Lifetime Value
IIRindia12 vistas

Más de gufranresearcher

IT Computer System analysis and design por
IT Computer System analysis and designIT Computer System analysis and design
IT Computer System analysis and designgufranresearcher
163 vistas259 diapositivas
MIS managing digital firms companies por
MIS managing digital firms companies MIS managing digital firms companies
MIS managing digital firms companies gufranresearcher
147 vistas192 diapositivas
IT system security principles practices por
IT system security principles practicesIT system security principles practices
IT system security principles practicesgufranresearcher
106 vistas246 diapositivas
BASIC C++ PROGRAMMING por
BASIC C++ PROGRAMMINGBASIC C++ PROGRAMMING
BASIC C++ PROGRAMMINGgufranresearcher
99 vistas145 diapositivas
C++Programming Language Tips Tricks Understandings por
C++Programming Language Tips Tricks UnderstandingsC++Programming Language Tips Tricks Understandings
C++Programming Language Tips Tricks Understandingsgufranresearcher
105 vistas10 diapositivas
Introduction to-computers por
Introduction to-computersIntroduction to-computers
Introduction to-computersgufranresearcher
152 vistas56 diapositivas

Más de gufranresearcher(20)

IT Computer System analysis and design por gufranresearcher
IT Computer System analysis and designIT Computer System analysis and design
IT Computer System analysis and design
gufranresearcher163 vistas
MIS managing digital firms companies por gufranresearcher
MIS managing digital firms companies MIS managing digital firms companies
MIS managing digital firms companies
gufranresearcher147 vistas
IT system security principles practices por gufranresearcher
IT system security principles practicesIT system security principles practices
IT system security principles practices
gufranresearcher106 vistas
C++Programming Language Tips Tricks Understandings por gufranresearcher
C++Programming Language Tips Tricks UnderstandingsC++Programming Language Tips Tricks Understandings
C++Programming Language Tips Tricks Understandings
gufranresearcher105 vistas
Software requirements-using-unified-process por gufranresearcher
Software requirements-using-unified-processSoftware requirements-using-unified-process
Software requirements-using-unified-process
gufranresearcher81 vistas
Rational unified-process-for-software-development por gufranresearcher
Rational unified-process-for-software-developmentRational unified-process-for-software-development
Rational unified-process-for-software-development
gufranresearcher96 vistas

Último

Meet the Bible por
Meet the BibleMeet the Bible
Meet the BibleSteve Thomason
78 vistas80 diapositivas
The Accursed House by Émile Gaboriau por
The Accursed House  by Émile GaboriauThe Accursed House  by Émile Gaboriau
The Accursed House by Émile GaboriauDivyaSheta
251 vistas15 diapositivas
INT-244 Topic 6b Confucianism por
INT-244 Topic 6b ConfucianismINT-244 Topic 6b Confucianism
INT-244 Topic 6b ConfucianismS Meyer
45 vistas77 diapositivas
Volf work.pdf por
Volf work.pdfVolf work.pdf
Volf work.pdfMariaKenney3
89 vistas43 diapositivas
Parts of Speech (1).pptx por
Parts of Speech (1).pptxParts of Speech (1).pptx
Parts of Speech (1).pptxmhkpreet001
46 vistas20 diapositivas
JRN 362 - Lecture Twenty-Three (Epilogue) por
JRN 362 - Lecture Twenty-Three (Epilogue)JRN 362 - Lecture Twenty-Three (Epilogue)
JRN 362 - Lecture Twenty-Three (Epilogue)Rich Hanley
41 vistas57 diapositivas

Último(20)

The Accursed House by Émile Gaboriau por DivyaSheta
The Accursed House  by Émile GaboriauThe Accursed House  by Émile Gaboriau
The Accursed House by Émile Gaboriau
DivyaSheta251 vistas
INT-244 Topic 6b Confucianism por S Meyer
INT-244 Topic 6b ConfucianismINT-244 Topic 6b Confucianism
INT-244 Topic 6b Confucianism
S Meyer45 vistas
Parts of Speech (1).pptx por mhkpreet001
Parts of Speech (1).pptxParts of Speech (1).pptx
Parts of Speech (1).pptx
mhkpreet00146 vistas
JRN 362 - Lecture Twenty-Three (Epilogue) por Rich Hanley
JRN 362 - Lecture Twenty-Three (Epilogue)JRN 362 - Lecture Twenty-Three (Epilogue)
JRN 362 - Lecture Twenty-Three (Epilogue)
Rich Hanley41 vistas
Class 9 lesson plans por TARIQ KHAN
Class 9 lesson plansClass 9 lesson plans
Class 9 lesson plans
TARIQ KHAN82 vistas
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv... por Taste
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...
Taste55 vistas
The Future of Micro-credentials: Is Small Really Beautiful? por Mark Brown
The Future of Micro-credentials:  Is Small Really Beautiful?The Future of Micro-credentials:  Is Small Really Beautiful?
The Future of Micro-credentials: Is Small Really Beautiful?
Mark Brown75 vistas
Career Building in AI - Technologies, Trends and Opportunities por WebStackAcademy
Career Building in AI - Technologies, Trends and OpportunitiesCareer Building in AI - Technologies, Trends and Opportunities
Career Building in AI - Technologies, Trends and Opportunities
WebStackAcademy45 vistas
Narration lesson plan por TARIQ KHAN
Narration lesson planNarration lesson plan
Narration lesson plan
TARIQ KHAN75 vistas
12.5.23 Poverty and Precarity.pptx por mary850239
12.5.23 Poverty and Precarity.pptx12.5.23 Poverty and Precarity.pptx
12.5.23 Poverty and Precarity.pptx
mary850239381 vistas

Data mining-basic

  • 2. Contents  What is Data Mining?  Data Mining / KDD process  Different aspects of Data Mining  Why Data Mining?  Data Mining in Business  Examples  How to do Data Mining?
  • 3. What is Data Mining ?  Taking out interesting (significant, unusual, hidden, useful, helpful) patterns or knowledge from large databases (giga-, tera-byte or peta-byte data)  Exploring and analyzing detailed business transactions, i.e., "digging through tons of data" to uncover patterns and relationships that contained within the business activity and history  Example: to better understand customer behavior and preferences, market prediction in future, business decision making, trend analysis
  • 5. Different Aspects of Data Mining  Different names of the same thing (Data Mining / Knowledge Discovery in Database / Pattern Analysis / Business Intelligence  Applying Intelligence/cleverness into Business brings Business Intelligence  Investigating prototypes/models brings Pattern Analysis
  • 6. Why Data Mining?  Exponential growth of business data: mega- -> giga- -> tera- -> peta- bytes  Human incapability to analyze large data  Unable to predict/forecast what may happen in the future  Faster queries (OLAP) because of dealing with reading data mainly
  • 7. Why Data Mining?  Flexibility to access multiple and heterogeneous databases, stores  Multidimensional computation easier, in addition to significant data mining algorithmic computations  Example areas: Business, Sciences, Social and Technical databases
  • 8. Data Mining in Business
  • 9. Examples of What People are doing with Data Mining: Fraud/Non-Compliance Anomaly detection  Isolate the factors that lead to fraud, waste and abuse  Target auditing and investigative efforts more effectively Credit/Risk Scoring Intrusion detection Parts failure prediction Recruiting/Attracting customers Maximizing profitability (cross selling, identifying profitable customers) Service Delivery and Customer Retention  Build profiles of customers likely to use which services Web Mining
  • 10. How to mine the data?  Applying KDD processes, using DM algorithms and Data Mining Tools  Multidimensional / Cube Modeling of multidimensional data-sets (considering every table as a dimension)  Association and Co-relation Analysis of frequent patterns (finding out relations among data)  Classification and label prediction (putting new data to known groups as per attributes/characteristics) Supervised learning  Cluster Analysis (discovering groups in data as per likeness/similarity) Unsupervised learning
  • 11. How to mine the data?  Outlier Analysis (discovering abnormal data among data-sets)  Time dependent Sequential Analysis (finding out time variant series and its tendency)  Graph and network Analysis (looking for visualized/hidden nodes and links in networks/graphs)
  • 12. Data Mining Applications  Data mining in CRM Customer Relationship Management applications  can contribute significantly to the bottom line (Rather than randomly contacting a prospect or customer through a call center or sending mail, a company can concentrate its efforts on prospects that are predicted to have a high likelihood of responding to an offer)  More sophisticated methods may be used to optimize resources across campaigns so that one may predict which channel and which offer an individual is most likely to respond to—across all potential offers.  Additionally, sophisticated applications could be used to automate the mailing. Once the results from data mining (potential prospect/customer and channel/offer) are determined, this "sophisticated application" can either automatically send an e-mail or regular mail. Finally, in cases where many people will take an action without an offer, uplift modeling can be used to determine which people will have the greatest increase in responding if given an offer.  Data Clustering can also be used to automatically discover the segments or groups within a customer data set.
  • 13. Data Mining Applications  Businesses employing data mining may see a return on investment, but also they recognize that the number of predictive models can quickly become very large. Rather than one model to predict how many customers will churn, a business could build a separate model for each region and customer type. Then instead of sending an offer to all people that are likely to churn, it may only want to send offers to customers. And finally, it may also want to determine which customers are going to be profitable over a window of time and only send the offers to those that are likely to be profitable. In order to maintain this quantity of models, they need to manage model versions and move to automated data mining.  Data mining can also be helpful to human-resources departments in identifying the characteristics of their most successful employees. Information obtained, such as universities attended by highly successful employees, can help HR focus recruiting efforts accordingly. Additionally, Strategic Enterprise Management applications help a company translate corporate-level goals, such as profit and margin share targets, into operational decisions, such as production plans and workforce levels.
  • 14. Data Mining Applications  Another example of data mining, often called the market basket analysis, relates to its use in retail sales. If a clothing store records the purchases of customers, a data-mining system could identify those customers who favors silk shirts over cotton ones. Although some explanations of relationships may be difficult, taking advantage of it is easier. The example deals with association rules within transaction-based data. Not all data are transaction based and logical or inexact rules may also be present within a database. In a manufacturing application, an inexact rule may state that 73% of products which have a specific defect or problem will develop a secondary problem within the next six months.  Market basket analysis has also been used to identify the purchase patterns of the Alpha consumer. Alpha Consumers are people that play a key role in connecting with the concept behind a product, then adopting that product, and finally validating it for the rest of society. Analyzing the data collected on this type of users has allowed companies to predict future buying trends and forecast supply demands.  Data Mining is a highly effective tool in the catalog marketing industry. Catalogers have a rich history of customer transactions on millions of customers dating back several years. Data mining tools can identify patterns among customers and help identify the most likely customers to respond to upcoming mailing campaigns.  Related to an integrated-circuit production line, an example of data mining is described in the paper "Mining IC Test Data to Optimize VLSI Testing."In this paper the application of data mining and decision analysis to the problem of die-level functional test is described. Experiments mentioned in this paper demonstrate the ability of applying a system of mining historical die-test data to create a probabilistic model of patterns of die failure which are then utilized to decide in real time which die to test next and when to stop testing. This system has been shown, based on experiments with historical test data, to have the potential to improve profits on mature IC products.