DataMining Techniq

Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner.

ASIET Kalady
Data mining
Techniques
Assignment
RespaPeter
10/8/2013
Data mining
Data mining is the analysis of (often large) observational data sets to find unsuspected
relationships and to summarize the data in novel ways that are both understandable and useful to
the data owner.
Data mining Techniques
In additiontousinga particulardata miningtool, internal auditors can choose from a variety of
data miningtechniques.The mostcommonlyusedtechniquesincludeartificial neuralnetworks,decision
trees, and the nearest-neighbor method. Each of these techniques analyzes data in different ways:
 Artificial neural networks are non-linear, predictive models that learn through training.
Although they are powerful predictive modeling techniques, some of the power comes at
the expense of ease of use and deployment.
One area where auditors can easily use them is when reviewing records to
identify fraud and fraud-like actions. Because of their complexity, they are better
employed in situations where they can be used and reused, such as reviewing credit card
transactions every month to check for anomalies.
 Decision trees are tree-shaped structures that represent decision sets. These decisions
generate rules, which then are used to classify data. Decision trees are the favored
technique for building understandable models. Auditors can use them to assess, for
example, whether the organization is using an appropriate cost-effective marketing
strategy that is based on the assigned value of the customer, such as profit.
 The nearest-neighbor method classifies dataset records based on similar data in a
historical dataset. Auditors can use this approach to define a document that is interesting
to them and ask the system to search for similar items.
The most commonly used techniques include artificial neural networks, decision trees,
and the nearest-neighbor method. Each of these techniques analyzes data in different ways:
 Association
Association (or relation) is probably the better known and most familiar and
straightforward data mining technique. Here, it make a simple correlation between two or
more items, often of the same type to identify patterns. For example, when tracking
people's buying habits, you might identify that a customer always buys cream when they
buy strawberries, and therefore suggest that the next time that they buy strawberries they
might also want to buy cream.
 Classification
It use to build up an idea of the type of customer, item, or object by describing
multiple attributes to identify a particular class. For example, you can easily classify cars
into different types (sedan, 4x4, convertible) by identifying different attributes (number
of seats, car shape, driven wheels). Given a new car, you might apply it into a particular
class by comparing the attributes with our known definition. You can apply the same
principles to customers, for example by classifying them by age and social group.
 Clustering
Clustering is the task of segmenting a diverse group into a number of similar sub groups
or clusters.The distingiush clustering from classification is that clustering is not rely
on predefined classes.
The records are grouped together on the basis of self similarity.clustering is often done as
a prelude to some other form of datamining.
 Prediction
Prediction is a wide topic and runs from predicting the failure of components or
machinery, to identifying fraud and even the prediction of company profits. Used in combination
with the other data mining techniques, prediction involves analyzing trends, classification,
pattern matching, and relation. By analyzing past events or instances, you can make a prediction
about an event.
Using the credit card authorization, for example, you might combine decision tree analysis of
individual past transactions with classification and historical pattern matches to identify whether
a transaction is fraudulent. Making a match between the purchase of flights to the US and
transactions in the US, it is likely that the transaction is valid.
 Sequential patterns
Oftern used over longer-term data, sequential patterns are a useful method for identifying
trends, or regular occurrences of similar events. For example, with customer data you can
identify that customers buy a particular collection of products together at different times of the
year. In a shopping basket application, you can use this information to automatically suggest that
certain items be added to a basket based on their frequency and past purchasing history.
 Decision trees
Related to most of the other techniques (primarily classification and prediction), the
decision tree can be used either as a part of the selection criteria, or to support the use and
selection of specific data within the overall structure. Within the decision tree, you start with a
simple question that has two (or sometimes more) answers. Each answer leads to a further
question to help classify or identify the data so that it can be categorized, or so that a prediction
can be made based on each answer.
Each of these approaches brings different advantages and disadvantages that need to be
considered prior to their use. Neural networks, which are difficult to implement, require all input
and resultant output to be expressed numerically, thus needing some sort of interpretation
depending on the nature of the data-mining exercise
The decisiontree techniqueisthe mostcommonlyused methodology, because it is simple and
straightforwardtoimplement.Finally,the nearest-neighbormethodreliesmore onlinkingsimilar items
and, therefore, works better for extrapolation rather than predictive enquiries.
A goodway to applyadvanceddataminingtechniquesis to have a flexible and interactive data
mining tool that is fully integrated with a database or data warehouse. Using a tool that operates
outside of the database or data warehouse is not as efficient
Regardlessof the technique used,the real value behind data mining is modeling — the process
of buildingamodel basedonuser-specifiedcriteriafromalreadycaptureddata.Once a model is built, it
can be used in similar situations where an answer is not known.
For example,an organization looking to acquire new customers can create a model of its ideal
customer that is based on existing data captured from people who previously purchased the product.
The model then is used to query data on prospective customers to see if they match the profile.
Modeling also can be used in audit departments to predict the number of auditors required to
undertake an audit plan based on previous attempts and similar work.

Recomendados

DatabaseDatabase
DatabaseRespa Peter
503 vistas9 diapositivas
Data mining an introductionData mining an introduction
Data mining an introductionDr-Dipali Meher
7.9K vistas38 diapositivas
Unit iUnit i
Unit iAishwaryaLakshmiA
25 vistas29 diapositivas
Data MiningData Mining
Data MiningsolairajAnandappan
144 vistas78 diapositivas
Data miningData mining
Data miningDaminda Herath
744 vistas63 diapositivas

Más contenido relacionado

La actualidad más candente

DatabaseDatabase
Databasesumit621
804 vistas18 diapositivas
GhhhGhhh
Ghhhagammya
1.4K vistas63 diapositivas

La actualidad más candente(17)

Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
Er. Nawaraj Bhandari11K vistas
DatabaseDatabase
Database
sumit621804 vistas
GhhhGhhh
Ghhh
agammya1.4K vistas
Data PreprocessingData Preprocessing
Data Preprocessing
VijayasankariS57 vistas
Part1Part1
Part1
sumit6211.8K vistas
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
Valerii Klymchuk2.8K vistas
Data mininng trendsData mininng trends
Data mininng trends
VijayasankariS48 vistas
5 data preparation and processing25 data preparation and processing2
5 data preparation and processing2
Mahmoud Alfarra174 vistas
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
Rajendran 366 vistas
Data modelData model
Data model
Syed Zaid Irshad323 vistas
Cssu dw dmCssu dw dm
Cssu dw dm
sumit6211.1K vistas
12341234
1234
Komal Patil796 vistas

Destacado

Why Datamining?Why Datamining?
Why Datamining?Aamir Soomro
77 vistas6 diapositivas
Biometric TechnologyBiometric Technology
Biometric TechnologyAdoitya Kaila
2.9K vistas15 diapositivas
biometric technologybiometric technology
biometric technologyAnmol Bagga
18.4K vistas33 diapositivas
3d internet3d internet
3d internetSubhashree Malla
57K vistas20 diapositivas

Destacado(6)

Why Datamining?Why Datamining?
Why Datamining?
Aamir Soomro77 vistas
Biometric TechnologyBiometric Technology
Biometric Technology
Adoitya Kaila2.9K vistas
biometric technologybiometric technology
biometric technology
Anmol Bagga18.4K vistas
3d internet3d internet
3d internet
Subhashree Malla57K vistas
BiometricsBiometrics
Biometrics
National Institute of Technology Durgapur2.3K vistas

Similar a DataMining Techniq

Data MiningData Mining
Data MiningGary Stefan
1.4K vistas13 diapositivas
Data Science.pdfData Science.pdf
Data Science.pdfSatishkumar722293
4 vistas5 diapositivas
Data Science.pdfData Science.pdf
Data Science.pdfSatishkumar722293
6 vistas5 diapositivas
Data Science.pdfData Science.pdf
Data Science.pdfSatishkumar722293
5 vistas5 diapositivas
Data mining-basicData mining-basic
Data mining-basicgufranresearcher
109 vistas14 diapositivas

Similar a DataMining Techniq(20)

Data MiningData Mining
Data Mining
Gary Stefan1.4K vistas
Data Science.pdfData Science.pdf
Data Science.pdf
Satishkumar7222934 vistas
Data Science.pdfData Science.pdf
Data Science.pdf
Satishkumar7222936 vistas
Data Science.pdfData Science.pdf
Data Science.pdf
Satishkumar7222935 vistas
Data mining-basicData mining-basic
Data mining-basic
gufranresearcher109 vistas
Data MiningData Mining
Data Mining
vihangshah12465 vistas
Data MiningData Mining
Data Mining
SOMASUNDARAM T30 vistas
Unit 4 Advanced Data AnalyticsUnit 4 Advanced Data Analytics
Unit 4 Advanced Data Analytics
KLE Society's SCP Arts, Science and DDS Commerce College, Mahalingpur211 vistas
Data MiningData Mining
Data Mining
AbinayaJawahar7 vistas
Seminar PresentationSeminar Presentation
Seminar Presentation
Vaibhav Dhattarwal348 vistas
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
DrGnaneswariG3 vistas
Data miningData mining
Data mining
Murniana Shazwen332 vistas
Data miningData mining
Data mining
Murniana Shazwen430 vistas
Mining internal sources of dataMining internal sources of data
Mining internal sources of data
nomanbhutta1K vistas
Data Mining for Big Data-Murat YazıcıData Mining for Big Data-Murat Yazıcı
Data Mining for Big Data-Murat Yazıcı
Murat YAZICI, M.Sc.367 vistas
Data miningData mining
Data mining
pradeepa n1.4K vistas

Más de Respa Peter

Tpes of SoftwaresTpes of Softwares
Tpes of SoftwaresRespa Peter
115 vistas22 diapositivas
Types of sql injection attacksTypes of sql injection attacks
Types of sql injection attacksRespa Peter
20K vistas10 diapositivas
 software failures software failures
software failuresRespa Peter
2.4K vistas7 diapositivas
Cloud computingCloud computing
Cloud computingRespa Peter
378 vistas4 diapositivas

Más de Respa Peter(13)

Tpes of SoftwaresTpes of Softwares
Tpes of Softwares
Respa Peter115 vistas
Types of sql injection attacksTypes of sql injection attacks
Types of sql injection attacks
Respa Peter20K vistas
 software failures software failures
software failures
Respa Peter2.4K vistas
Cloud computingCloud computing
Cloud computing
Respa Peter378 vistas
Managing software developmentManaging software development
Managing software development
Respa Peter1K vistas
Data miningData mining
Data mining
Respa Peter331 vistas
KnimeKnime
Knime
Respa Peter2.3K vistas
Genetic algorithmGenetic algorithm
Genetic algorithm
Respa Peter4K vistas
Matrix multiplicationdesignMatrix multiplicationdesign
Matrix multiplicationdesign
Respa Peter1.3K vistas
Matrix chain multiplicationMatrix chain multiplication
Matrix chain multiplication
Respa Peter33.8K vistas
Open shortest path first (ospf)Open shortest path first (ospf)
Open shortest path first (ospf)
Respa Peter13.3K vistas

Último(20)

GSoC 2024GSoC 2024
GSoC 2024
DeveloperStudentClub1049 vistas
Plastic waste.pdfPlastic waste.pdf
Plastic waste.pdf
alqaseedae81 vistas
Scope of Biochemistry.pptxScope of Biochemistry.pptx
Scope of Biochemistry.pptx
shoba shoba110 vistas
Chemistry of sex hormones.pptxChemistry of sex hormones.pptx
Chemistry of sex hormones.pptx
RAJ K. MAURYA97 vistas
Azure DevOps Pipeline setup for Mule APIs #36Azure DevOps Pipeline setup for Mule APIs #36
Azure DevOps Pipeline setup for Mule APIs #36
MysoreMuleSoftMeetup75 vistas
Class 10 English notes 23-24.pptxClass 10 English notes 23-24.pptx
Class 10 English notes 23-24.pptx
Tariq KHAN63 vistas
ICANNICANN
ICANN
RajaulKarim2057 vistas
discussion post.pdfdiscussion post.pdf
discussion post.pdf
jessemercerail70 vistas
Women from Hackney’s History: Stoke Newington by Sue DoeWomen from Hackney’s History: Stoke Newington by Sue Doe
Women from Hackney’s History: Stoke Newington by Sue Doe
History of Stoke Newington103 vistas
231112 (WR) v1  ChatGPT OEB 2023.pdf231112 (WR) v1  ChatGPT OEB 2023.pdf
231112 (WR) v1 ChatGPT OEB 2023.pdf
WilfredRubens.com100 vistas
Use of Probiotics in Aquaculture.pptxUse of Probiotics in Aquaculture.pptx
Use of Probiotics in Aquaculture.pptx
AKSHAY MANDAL69 vistas
ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}
ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}
DR .PALLAVI PATHANIA156 vistas
CWP_23995_2013_17_11_2023_FINAL_ORDER.pdfCWP_23995_2013_17_11_2023_FINAL_ORDER.pdf
CWP_23995_2013_17_11_2023_FINAL_ORDER.pdf
SukhwinderSingh895865467 vistas
Narration  ppt.pptxNarration  ppt.pptx
Narration ppt.pptx
Tariq KHAN62 vistas
Industry4wrd.pptxIndustry4wrd.pptx
Industry4wrd.pptx
BC Chew153 vistas
Sociology KS5Sociology KS5
Sociology KS5
WestHatch50 vistas
NS3 Unit 2 Life processes of animals.pptxNS3 Unit 2 Life processes of animals.pptx
NS3 Unit 2 Life processes of animals.pptx
manuelaromero201389 vistas
Narration lesson plan.docxNarration lesson plan.docx
Narration lesson plan.docx
Tariq KHAN90 vistas

DataMining Techniq

  • 2. Data mining Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner. Data mining Techniques In additiontousinga particulardata miningtool, internal auditors can choose from a variety of data miningtechniques.The mostcommonlyusedtechniquesincludeartificial neuralnetworks,decision trees, and the nearest-neighbor method. Each of these techniques analyzes data in different ways:  Artificial neural networks are non-linear, predictive models that learn through training. Although they are powerful predictive modeling techniques, some of the power comes at the expense of ease of use and deployment. One area where auditors can easily use them is when reviewing records to identify fraud and fraud-like actions. Because of their complexity, they are better employed in situations where they can be used and reused, such as reviewing credit card transactions every month to check for anomalies.  Decision trees are tree-shaped structures that represent decision sets. These decisions generate rules, which then are used to classify data. Decision trees are the favored technique for building understandable models. Auditors can use them to assess, for example, whether the organization is using an appropriate cost-effective marketing strategy that is based on the assigned value of the customer, such as profit.  The nearest-neighbor method classifies dataset records based on similar data in a historical dataset. Auditors can use this approach to define a document that is interesting to them and ask the system to search for similar items.
  • 3. The most commonly used techniques include artificial neural networks, decision trees, and the nearest-neighbor method. Each of these techniques analyzes data in different ways:  Association Association (or relation) is probably the better known and most familiar and straightforward data mining technique. Here, it make a simple correlation between two or more items, often of the same type to identify patterns. For example, when tracking people's buying habits, you might identify that a customer always buys cream when they buy strawberries, and therefore suggest that the next time that they buy strawberries they might also want to buy cream.  Classification It use to build up an idea of the type of customer, item, or object by describing multiple attributes to identify a particular class. For example, you can easily classify cars into different types (sedan, 4x4, convertible) by identifying different attributes (number of seats, car shape, driven wheels). Given a new car, you might apply it into a particular class by comparing the attributes with our known definition. You can apply the same principles to customers, for example by classifying them by age and social group.  Clustering Clustering is the task of segmenting a diverse group into a number of similar sub groups or clusters.The distingiush clustering from classification is that clustering is not rely on predefined classes. The records are grouped together on the basis of self similarity.clustering is often done as a prelude to some other form of datamining.  Prediction
  • 4. Prediction is a wide topic and runs from predicting the failure of components or machinery, to identifying fraud and even the prediction of company profits. Used in combination with the other data mining techniques, prediction involves analyzing trends, classification, pattern matching, and relation. By analyzing past events or instances, you can make a prediction about an event. Using the credit card authorization, for example, you might combine decision tree analysis of individual past transactions with classification and historical pattern matches to identify whether a transaction is fraudulent. Making a match between the purchase of flights to the US and transactions in the US, it is likely that the transaction is valid.  Sequential patterns Oftern used over longer-term data, sequential patterns are a useful method for identifying trends, or regular occurrences of similar events. For example, with customer data you can identify that customers buy a particular collection of products together at different times of the year. In a shopping basket application, you can use this information to automatically suggest that certain items be added to a basket based on their frequency and past purchasing history.  Decision trees Related to most of the other techniques (primarily classification and prediction), the decision tree can be used either as a part of the selection criteria, or to support the use and selection of specific data within the overall structure. Within the decision tree, you start with a simple question that has two (or sometimes more) answers. Each answer leads to a further question to help classify or identify the data so that it can be categorized, or so that a prediction can be made based on each answer. Each of these approaches brings different advantages and disadvantages that need to be considered prior to their use. Neural networks, which are difficult to implement, require all input and resultant output to be expressed numerically, thus needing some sort of interpretation depending on the nature of the data-mining exercise
  • 5. The decisiontree techniqueisthe mostcommonlyused methodology, because it is simple and straightforwardtoimplement.Finally,the nearest-neighbormethodreliesmore onlinkingsimilar items and, therefore, works better for extrapolation rather than predictive enquiries. A goodway to applyadvanceddataminingtechniquesis to have a flexible and interactive data mining tool that is fully integrated with a database or data warehouse. Using a tool that operates outside of the database or data warehouse is not as efficient Regardlessof the technique used,the real value behind data mining is modeling — the process of buildingamodel basedonuser-specifiedcriteriafromalreadycaptureddata.Once a model is built, it can be used in similar situations where an answer is not known. For example,an organization looking to acquire new customers can create a model of its ideal customer that is based on existing data captured from people who previously purchased the product. The model then is used to query data on prospective customers to see if they match the profile. Modeling also can be used in audit departments to predict the number of auditors required to undertake an audit plan based on previous attempts and similar work.