SlideShare a Scribd company logo
1 of 12
Chapter 2
Related Concepts
2.1 Database/OLTP Systems
• Unlike a simple set, data in a database are usually viewed to have a particular
structure or schema which it is associated with.
• Unlike a file, a database is independent of the physical method used to store it.
• Data model is used to describe the data, attributes, and relationships among
them. A common data model is the ER (entity-relationship) data model. It can be
viewed as a documentation and communication tool to convey type and structure
of the actual data. A data model is independent of the particular the DBMS used.
• Basic database queries are well defined with precise results. Data mining
applications conversely are often vaguely defined with imprecise results. A data
mining query outputs a KDD object.
• A KDD object is either a rule, a classification, or a cluster, which do not exist
before executing the query, and are not part of the database being queried.
2.2 Fuzzy Sets and Fuzzy Logic
• A fuzzy set is a set, 𝐹, in which the set membership function, f, is a real valued (as opposed to
Boolean) function with output in the range [0,1]. An element 𝑥 is said to belong to 𝐹 with
probability 𝑓(𝑥) and simultaneously to be in ¬𝐹 with probability 1 − 𝑓(𝑥)
• Membership function is not Boolean so the results of this query are fuzzy. Classification problem is
solved by assigning a set membership function to each record for each class. The record is then
assigned to the class that has the highest membership function value.
• Association rules are generated given a confidence value that indicates the degree to which it
holds in the entire database. This can be thought of as a membership function.
• Fuzzy logic uses rules and membership functions to estimate a continuous function. Fuzzy logic is
a valuable tool to develop control systems for such things as elevators, trains, and heating
systems.
𝑚𝑒𝑚 ¬𝑥 = 1 − 𝑚𝑒𝑚 𝑥
𝑚𝑒𝑚 𝑥 ∧ 𝑦 = min(𝑚𝑒𝑚 𝑥 , 𝑚𝑒𝑚 𝑦 )
𝑚𝑒𝑚 𝑥 ⋁𝑦 = max(𝑚𝑒𝑚 𝑥 , 𝑚𝑒𝑚 𝑦 )
2.3 Information Retrieval
The effectiveness of the IR system in processing the
query is often measured by precision and recall:
• 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑎𝑛𝑑 𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑
𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑
• 𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑎𝑛𝑑 𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡
The inverse document frequency (IDF) is often used
by similarity measures. Given a keyword, 𝑘, and
𝑛 documents, IDF can be defined as:
• 𝐼𝐷𝐹𝑘 = log
𝑛
𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑘
+ 1
Concept hierarchies (tree or DAG (directed acyclic
graph) ) can be used in spatial data mining.
2.5 Data Warehousing
• Data warehouse is a set of data that supports DSS and is subject-
oriented, integrated, time-variant, nonvolatile.
• DW contains contains informational data, which are used to support
other functions such as planning and forecasting.
• OLAP retrieval tools facilitate quick query response at all granularities.
2.6 Dimensional Modeling
Multidimensional star schema Multidimensional constellation schema
• A dimension is a collection of logically related attributes and is viewed as an axis
for modeling the data.
• The specific data stored are called facts and usually are numeric data. In a
relational system each dimension is a table and facts are stored in a fact table.
Multidimensional Data Cube
2.7 Online Analytic Processing (OLAP)
OLAP supports as hoc querying of the data warehouse. OLAP requires
a multidimensional view of the data and involves some analysis.
Operations supported: Slice, Dice, Roll up, Drill down, Visualization
2.8 Statistics
• Such simple concepts as determining a data distribution and calculating a mean, a
variance can be viewed as data mining techniques in their own, a descriptive model for
the data under consideration.
• When a model is generated, the goal is to fit it to the entire data, not just a sample
searched. Assumptions often made about independence of data may be incorrect, thus
leading to errors in the resulting model. Any model should be statistically significant,
meaningful, and valid.
• An often used tool in data mining and machine learning is one of sampling. Here a subset
of the total population is examined, and a generalization (model) about the entire
population is made from this subset.
• The term exploratory data analysis describes the fact that the data can actually drive the
creation of the model and any statistical characteristics.
• Some data mining applications determine correlations among data. These relationships,
however, are not casual in nature. Care must be taken when assigning significance to
such relationships.
2.9 Machine Learning
• Data mining involves not only modeling but also the development of effective and
efficient algorithms and data structures to perform the modeling on large data sets.
• Machine learning is the area of AI that examines how to write programs that can learn. In
data mining, machine learning is often used for prediction or classification.
• Predictive modeling is done in two phases. During the training phase, historical or
sampled data are used to create a model that represents those data. It is assumed to
hold for the whole database and its future states. The testing phase then applies this
model to the remaining and future data.
• With supervised learning a sample of the database is used to train the system to properly
perform the desired task. The quality of the training data determines how well the
program learns. With unsupervised learning there is no knowledge of the correct
answers of applying the model to the data.
• The objective for data mining is to uncover useful information and provide it to humans,
while machine learning research is focused more on the learning portion.
References:
Dunham, Margaret H. “Data Mining: Introductory and Advanced
Topics”. Pearson Education, Inc., 2003.

More Related Content

What's hot

Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 
A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...
IJERA Editor
 

What's hot (20)

Graph Clustering and cluster
Graph Clustering and clusterGraph Clustering and cluster
Graph Clustering and cluster
 
Data reduction
Data reductionData reduction
Data reduction
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problems
 
Survey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data MiningSurvey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data Mining
 
Clustering
ClusteringClustering
Clustering
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
 
Data Reduction Stratergies
Data Reduction StratergiesData Reduction Stratergies
Data Reduction Stratergies
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Data mining
Data miningData mining
Data mining
 
A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Machine Learning by Analogy
Machine Learning by AnalogyMachine Learning by Analogy
Machine Learning by Analogy
 
IRJET- Missing Data Imputation by Evidence Chain
IRJET- Missing Data Imputation by Evidence ChainIRJET- Missing Data Imputation by Evidence Chain
IRJET- Missing Data Imputation by Evidence Chain
 
Presentation_Malware Analysis.pptx
Presentation_Malware Analysis.pptxPresentation_Malware Analysis.pptx
Presentation_Malware Analysis.pptx
 
M033059064
M033059064M033059064
M033059064
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning Clustering
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mining
 
The pertinent single-attribute-based classifier for small datasets classific...
The pertinent single-attribute-based classifier  for small datasets classific...The pertinent single-attribute-based classifier  for small datasets classific...
The pertinent single-attribute-based classifier for small datasets classific...
 

Viewers also liked

Computing Accuracy Precision And Recall
Computing Accuracy Precision And RecallComputing Accuracy Precision And Recall
Computing Accuracy Precision And Recall
Nicolas Bettenburg
 
Intrusion detection using data mining
Intrusion detection using data miningIntrusion detection using data mining
Intrusion detection using data mining
balbeerrawat
 

Viewers also liked (19)

01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
Data Warehouse Project
Data Warehouse ProjectData Warehouse Project
Data Warehouse Project
 
Database Project
Database ProjectDatabase Project
Database Project
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Artificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectArtificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support Project
 
DM for IDS
DM for IDSDM for IDS
DM for IDS
 
A Study on Data Mining Based Intrusion Detection System
A Study on Data Mining Based Intrusion Detection SystemA Study on Data Mining Based Intrusion Detection System
A Study on Data Mining Based Intrusion Detection System
 
Computing Accuracy Precision And Recall
Computing Accuracy Precision And RecallComputing Accuracy Precision And Recall
Computing Accuracy Precision And Recall
 
Probability
ProbabilityProbability
Probability
 
Global KTech Corporate Deck
Global KTech Corporate DeckGlobal KTech Corporate Deck
Global KTech Corporate Deck
 
Presentación Constitución
Presentación Constitución Presentación Constitución
Presentación Constitución
 
Presentatie les 2 externe communicatie - vacatures schrijven
Presentatie les 2 externe communicatie - vacatures schrijvenPresentatie les 2 externe communicatie - vacatures schrijven
Presentatie les 2 externe communicatie - vacatures schrijven
 
Bab 5-ting-4.ppt
Bab 5-ting-4.pptBab 5-ting-4.ppt
Bab 5-ting-4.ppt
 
Periodo simples e_composto
Periodo simples e_compostoPeriodo simples e_composto
Periodo simples e_composto
 
L'apport de l'analyse d'entreprise dans les projets
L'apport de l'analyse d'entreprise dans les projetsL'apport de l'analyse d'entreprise dans les projets
L'apport de l'analyse d'entreprise dans les projets
 
Intrusion detection using data mining
Intrusion detection using data miningIntrusion detection using data mining
Intrusion detection using data mining
 
03 Data Representation
03 Data Representation03 Data Representation
03 Data Representation
 
Databse Intrusion Detection Using Data Mining Approach
Databse Intrusion Detection Using Data Mining ApproachDatabse Intrusion Detection Using Data Mining Approach
Databse Intrusion Detection Using Data Mining Approach
 

Similar to 02 Related Concepts

Lecture 1. Data Structure & Algorithm.pptx
Lecture 1. Data Structure & Algorithm.pptxLecture 1. Data Structure & Algorithm.pptx
Lecture 1. Data Structure & Algorithm.pptx
ArifKamal36
 
DISE - Database Concepts
DISE - Database ConceptsDISE - Database Concepts
DISE - Database Concepts
Rasan Samarasinghe
 

Similar to 02 Related Concepts (20)

DM_Notes.pptx
DM_Notes.pptxDM_Notes.pptx
DM_Notes.pptx
 
Lecture 1. Data Structure & Algorithm.pptx
Lecture 1. Data Structure & Algorithm.pptxLecture 1. Data Structure & Algorithm.pptx
Lecture 1. Data Structure & Algorithm.pptx
 
Data Mining Technniques
Data Mining TechnniquesData Mining Technniques
Data Mining Technniques
 
DatabaseManagementSystem.pptx
DatabaseManagementSystem.pptxDatabaseManagementSystem.pptx
DatabaseManagementSystem.pptx
 
data structures and its importance
 data structures and its importance  data structures and its importance
data structures and its importance
 
Data modelling it's process and examples
Data modelling it's process and examplesData modelling it's process and examples
Data modelling it's process and examples
 
Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Ch1_Intro-95(1).ppt
Ch1_Intro-95(1).pptCh1_Intro-95(1).ppt
Ch1_Intro-95(1).ppt
 
Data Modeling Training.pptx
Data Modeling Training.pptxData Modeling Training.pptx
Data Modeling Training.pptx
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
 
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEMM. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
 
Data processing
Data processingData processing
Data processing
 
Ch_2.pdf
Ch_2.pdfCh_2.pdf
Ch_2.pdf
 
Relational data base management system (Unit 1)
Relational data base management system (Unit 1)Relational data base management system (Unit 1)
Relational data base management system (Unit 1)
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data Mining
 
Unit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptxUnit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptx
 
DISE - Database Concepts
DISE - Database ConceptsDISE - Database Concepts
DISE - Database Concepts
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
 

More from Valerii Klymchuk

More from Valerii Klymchuk (6)

Sample presentation slides template
Sample presentation slides templateSample presentation slides template
Sample presentation slides template
 
Toronto Capstone
Toronto CapstoneToronto Capstone
Toronto Capstone
 
05 Scalar Visualization
05 Scalar Visualization05 Scalar Visualization
05 Scalar Visualization
 
06 Vector Visualization
06 Vector Visualization06 Vector Visualization
06 Vector Visualization
 
07 Tensor Visualization
07 Tensor Visualization07 Tensor Visualization
07 Tensor Visualization
 
Crime Analysis based on Historical and Transportation Data
Crime Analysis based on Historical and Transportation DataCrime Analysis based on Historical and Transportation Data
Crime Analysis based on Historical and Transportation Data
 

Recently uploaded

怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 

Recently uploaded (20)

怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 

02 Related Concepts

  • 2. 2.1 Database/OLTP Systems • Unlike a simple set, data in a database are usually viewed to have a particular structure or schema which it is associated with. • Unlike a file, a database is independent of the physical method used to store it. • Data model is used to describe the data, attributes, and relationships among them. A common data model is the ER (entity-relationship) data model. It can be viewed as a documentation and communication tool to convey type and structure of the actual data. A data model is independent of the particular the DBMS used. • Basic database queries are well defined with precise results. Data mining applications conversely are often vaguely defined with imprecise results. A data mining query outputs a KDD object. • A KDD object is either a rule, a classification, or a cluster, which do not exist before executing the query, and are not part of the database being queried.
  • 3. 2.2 Fuzzy Sets and Fuzzy Logic • A fuzzy set is a set, 𝐹, in which the set membership function, f, is a real valued (as opposed to Boolean) function with output in the range [0,1]. An element 𝑥 is said to belong to 𝐹 with probability 𝑓(𝑥) and simultaneously to be in ¬𝐹 with probability 1 − 𝑓(𝑥) • Membership function is not Boolean so the results of this query are fuzzy. Classification problem is solved by assigning a set membership function to each record for each class. The record is then assigned to the class that has the highest membership function value. • Association rules are generated given a confidence value that indicates the degree to which it holds in the entire database. This can be thought of as a membership function. • Fuzzy logic uses rules and membership functions to estimate a continuous function. Fuzzy logic is a valuable tool to develop control systems for such things as elevators, trains, and heating systems. 𝑚𝑒𝑚 ¬𝑥 = 1 − 𝑚𝑒𝑚 𝑥 𝑚𝑒𝑚 𝑥 ∧ 𝑦 = min(𝑚𝑒𝑚 𝑥 , 𝑚𝑒𝑚 𝑦 ) 𝑚𝑒𝑚 𝑥 ⋁𝑦 = max(𝑚𝑒𝑚 𝑥 , 𝑚𝑒𝑚 𝑦 )
  • 4. 2.3 Information Retrieval The effectiveness of the IR system in processing the query is often measured by precision and recall: • 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑎𝑛𝑑 𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 • 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑎𝑛𝑑 𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 The inverse document frequency (IDF) is often used by similarity measures. Given a keyword, 𝑘, and 𝑛 documents, IDF can be defined as: • 𝐼𝐷𝐹𝑘 = log 𝑛 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑘 + 1 Concept hierarchies (tree or DAG (directed acyclic graph) ) can be used in spatial data mining.
  • 5. 2.5 Data Warehousing • Data warehouse is a set of data that supports DSS and is subject- oriented, integrated, time-variant, nonvolatile. • DW contains contains informational data, which are used to support other functions such as planning and forecasting. • OLAP retrieval tools facilitate quick query response at all granularities.
  • 6. 2.6 Dimensional Modeling Multidimensional star schema Multidimensional constellation schema • A dimension is a collection of logically related attributes and is viewed as an axis for modeling the data. • The specific data stored are called facts and usually are numeric data. In a relational system each dimension is a table and facts are stored in a fact table.
  • 8. 2.7 Online Analytic Processing (OLAP) OLAP supports as hoc querying of the data warehouse. OLAP requires a multidimensional view of the data and involves some analysis. Operations supported: Slice, Dice, Roll up, Drill down, Visualization
  • 9. 2.8 Statistics • Such simple concepts as determining a data distribution and calculating a mean, a variance can be viewed as data mining techniques in their own, a descriptive model for the data under consideration. • When a model is generated, the goal is to fit it to the entire data, not just a sample searched. Assumptions often made about independence of data may be incorrect, thus leading to errors in the resulting model. Any model should be statistically significant, meaningful, and valid. • An often used tool in data mining and machine learning is one of sampling. Here a subset of the total population is examined, and a generalization (model) about the entire population is made from this subset. • The term exploratory data analysis describes the fact that the data can actually drive the creation of the model and any statistical characteristics. • Some data mining applications determine correlations among data. These relationships, however, are not casual in nature. Care must be taken when assigning significance to such relationships.
  • 10. 2.9 Machine Learning • Data mining involves not only modeling but also the development of effective and efficient algorithms and data structures to perform the modeling on large data sets. • Machine learning is the area of AI that examines how to write programs that can learn. In data mining, machine learning is often used for prediction or classification. • Predictive modeling is done in two phases. During the training phase, historical or sampled data are used to create a model that represents those data. It is assumed to hold for the whole database and its future states. The testing phase then applies this model to the remaining and future data. • With supervised learning a sample of the database is used to train the system to properly perform the desired task. The quality of the training data determines how well the program learns. With unsupervised learning there is no knowledge of the correct answers of applying the model to the data. • The objective for data mining is to uncover useful information and provide it to humans, while machine learning research is focused more on the learning portion.
  • 11.
  • 12. References: Dunham, Margaret H. “Data Mining: Introductory and Advanced Topics”. Pearson Education, Inc., 2003.