SlideShare una empresa de Scribd logo
1 de 40
Knowledge Discovery inKnowledge Discovery in
Remote Access DatabasesRemote Access Databases
A thesis submitted in partial fulfillment of the requirements for the degree ofA thesis submitted in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Computer ScienceDoctor of Philosophy in Computer Science
at the Institute of Mathematics and Computer Science Informaticsat the Institute of Mathematics and Computer Science Informatics
Debrecen of UniversityDebrecen of University
By Zakaria Suliman ZubiBy Zakaria Suliman Zubi
Supervised by Prof. Arato Matyas andSupervised by Prof. Arato Matyas and
Prof.Fazekas GáborProf.Fazekas Gábor
2
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
3
Introduction to KDDIntroduction to KDD
and DMand DM
 KDD is the process of extracting interesting (non-trivial, implicit,
previously unknown and potentially useful) information or
patterns from data in large databases.
 DM is a single step in KDD process which deals with extracting
trends or patterns from raw databases and carefully and
accurately transforms them into useful and understandable
information.
 In the introduction part (chapter 1) I will follow the structure of
expressing the History, Importance, Appearances and Tools for
KDD and DM in all sections of the introduction part in this
thesis.
Is a phase in which
noise data and
irrelevant data are
removed from the
collection. Multiple data sources,
often heterogeneous, may
be combined in a common
source.
The data relevant to the
analysis is decided on
and retrieved from the
data collection.
It is a phase in which
the selected data is
transformed into forms
appropriate for the
mining procedure.
It is the crucial step in which
clever techniques are applied
to extract patterns potentially
useful information.
Strictly interesting patterns
representing knowledge are
identified based on a given
measures.
In the final phase in which
the discovered knowledge is
visually represented to the
user.
KDD process
4
Introduction to KDDIntroduction to KDD
and DMand DM
KDD & DM shared with several topic
5
Introduction to KDDIntroduction to KDD
and DMand DM
 Access to databases was established via Open Database
Connectivity (ODBC) .
 Querying the databases can be maintained by Structured Query
Language (SQL). The aim of using SQL is to allow users to define
the data in databases and manipulate that data (adding, deleting and
retrieving ) it from raw databases.
 Using Data Visualization to represent Data Mining results.
6
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
7
Goal of the Thesis WorkGoal of the Thesis Work
 In this thesis work, we investigated the problem of matching DM
problems with the set of DM algorithms that are suitable for solving it.
 The use of visualization and its integration with algorithmic
approaches to tune the parameters of DM algorithms, in order to
support the parameter selection process, currently only explored by
algorithmic approaches, in a more systematic form than using default
values or setting parameter values without clues.
 Introducing visualization to provide expressive information about
induced models and statistics entities, and to support the interactive and
dynamic exploration of induced models for DM.
8
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implantation of KDQL.
 Conclusion.
 Appendix A , B.
9
Remote Access KDD models
Connection between KDD and ODBC
10
The architectures of
ODBC_KDD(1) model
11
The architectures of
ODBC_KDD (2) model
12
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
13
Logical Foundation in Data
Mining (LFDM)
 Expressiveness :First order logic can represent more complex concepts than
traditional attribute-value languages.
 Readability : Formulae are easier to read than decision trees or a set of linear
equations.
 Background knowledge: Background knowledge can be grown during
discovery time for example, in time series.
 Multiple tables: Multiple database tables can be handled without explicit and
expensive joins.
 Deductive databases: Logical discovery engines can be transparently linked to
relational databases via deductive databases.
Advantages of Logical Foundation in Data Mining
Disadvantages of Logical Foundation in Data Mining
 Language complexity : First order hypothesis are usually constructed through heavy
search ( discovery feasible).
 Database access times: Checking one single candidate might involve heavy querying.
 Number handling: Logical approaches to discovery usually suffer from poor number
handling capabilities.
14
Translating first order queries into SQL
 In our natural language a question such as “find all employers who are
mangers and getting salary or expenses more than 1000000 HUF a year”:
 expensive_employee(Name) ← employee(Name, Salary1,
Manager),Salary1 > 1000000, employee(Manager, Salary2),Salary1 >
Salary2
 SELECT employee_0.NAME
FROM employee employee_0, employee employee_1
WHERE employee_0.SALARY > 1000000 AND
employee_1.NAME = employee_0.MANAGER AND
employee_0.SALARY > employee_1.SALARY
Logical Foundation in
Data Mining (LFDM)
15
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
16
Association Rules
 What is an Association Rule? Association rule is a set of items
T={ia,ib,..,it}
T I, where I is the set of all possible items {i1,i2,…,in} in
D the task relevant data, D is a set of transactions.
An association rule is of the form :
P  Q, where P I, Q I, and P Q =Ø.
P Q holds in D with support s and
P Q has a confidence c in the transaction set D
 Example: “In 80% of the cases when people buy bread, they also
buy milk”
Bread ==> milk /80%
Mining the DiscoveredMining the Discovered
Association RulesAssociation Rules
⊂
⊂ ⊂ ∩
y(Q/P)ProbabilitQ)(PConfidence =→
Q)y(PProbabilitQ)Support(P ∪=→
17
Mining the Association Rules
 What is Mining the association rule? Finding frequent patterns,
associations, correlations, or causal structures among sets of items or
objects in transaction databases, relational databases, and other
information repositories. Selecting the most "interesting" rules based on
their confidence factors. If holds in D with support s and has a
confidence c in the transaction set D.
 Applications: Basket data analysis, cross-marketing, catalog design,
loss-leader analysis, clustering, classification, etc.
 Examples:
 “Body → Head [support, confidence]”
 buys(x, “bread”) → buys(x, “milk”) [6%, 65%]
 major(x, “CS”) takes(x, “Database”) → grade(x, “5”) [1%, 75%]
Mining the DiscoveredMining the Discovered
Association RulesAssociation Rules
18
 How do we Mine Association Rules?
 Input :
 A database of transactions.
 Each transaction is a list of items (Ex. purchased by a customer
in a visit).
 Find all rules that associate the presence of one set of items with
that of another set of items.
 Example: “98% of people who purchase tires and auto
accessories also get automotive services done”
 There are no restrictions on number of items in the body of the
rule.
Mining the DiscoveredMining the Discovered
Association RulesAssociation Rules
Mining the Association Rules cont.
19
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
20
What is Data Mining Query Language?
 Data Mining Query Language (DMQL)Data Mining Query Language (DMQL): Is an iterative process to the
KDD process, which discovered knowledge and presented the
knowledge to the user, the evaluation measures can be enhanced, the
mining can be further refined, new data can be selected or further
transformed, or new data sources can be integrated, in order to get
different, more appropriate results.
Data Mining QueryData Mining Query
Language (DMQL)Language (DMQL)
21
Types of discovered patterns by DMQL
 Characterization: Data characterization is a summarization of general
features of objects in a target class, and produces what is called characteristic
rules.
 Discrimination: Data discrimination produces what are called discriminant
rules and is basically the comparison of the general features of objects
between two classes referred to as the target class and the contrasting class.
 Association analysis: Association analysis is the discovery of what are
commonly called association rules.
 Classification: Classification analysis is the organization of data in given
classes.
 Prediction: Prediction has attracted considerable attention given the potential
implications of successful forecasting in a business context.
 Clustering: clustering is the organization of data in classes.
 Outlier analysis: Outliers are data elements that cannot be grouped in a given
class or cluster.
 Evolution and deviation analysis: Evolution and deviation analysis pertain
to the study of time related data that changes in time.
Data Mining QueryData Mining Query
Language (DMQL)Language (DMQL)
22
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
23
Knowledge Discovery QueryKnowledge Discovery Query
Language ( KDQL)Language ( KDQL)
What is KDQL in principle ?
 Knowledge Discovery Query Language (KDQL) is a KDD query language suggested to the ODBC_KDD(2)
model for mining the association rules in the databases (i.e. DBMS, relational database), and then to visualize
the discovered results in different charts forms (i.e. 2D and 3D). KDQL was not implemented namely yet. In
KDQL we join KDD technology and data visualization with conjunction of the request of creating query
language for DM tasks. This leads us to develop a language tool that can handle two approaches in one session.
RequestRequest
DataData
Data toData to
VisualizeVisualize
Visualization ToolVisualization Tool
Database Management SystemDatabase Management System
(DBMS(DBMS((
24
Visualization techniques for DMQL
Data Mining QueryData Mining Query
Language (DMQL)Language (DMQL)
Visualization ToolsVisualization Tools
Database Management SystemDatabase Management System
(DBMS(DBMS((
Knowledge DiscoveryKnowledge Discovery
Query Language ( KDQL)Query Language ( KDQL)
25
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
26
Motivation
 I-Extended DatabaseI-Extended Database : Is a database that in addition to data also
contain exceedingly defined generalizations about the data. Moreover,
I-extended database is a database that has similar properties that are in
inductive database. We formalize this concept and show how it can be
used throughout the whole process of DM due to the closure property
of the framework.
 The basic message in I-extended database is as follow:
 I-extended database consists of a normal database associated to a
subset of patterns from a class of patterns, and an evaluation
function that tells how the patterns occur in the data.
 I-extended database can be queried (in principle) just by using
normal relational algebra or SQL, with the added property of being
able to refer to the values of the evaluation function on the
patterns.
 Modeling KDD processes as a sequence of queries on i-extended
database gives rise to chances for reasoning and optimizing these
processes.
I-Extended Databases (I-ED)I-Extended Databases (I-ED)
27
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
28
Motivation of KDQL
 The background of KDQL came from the Structured Query Language
(SQL) since several extensions to the SQL have been proposed to
serve as a Data Mining Query Language (DMQL).
SQL + DM (rules) = is the appropriate form for this task on the user
interface.
DM (rules) is based on the association rules to interact I-extended
database. The association rules will be obtained by the use of KDQL
rules, and the results will be graphically represented in a 2D and 3D
charts.
Implementation of KDQLImplementation of KDQL
29
Architecture of KDQL
Implementation of KDQLImplementation of KDQL
30
Example of KDQL
 For example, the rule. { cheese, coke} ==> bread
 States that if cheese and coke are bought together in a
transaction, also bread is bought in the same transaction. In
this association rules, the body is a set of items and the head is a
single item. The rule {cheese, coke}==> cheese, is not
interesting because it is a tautology: in fact if the head is
implicated by the body the rule does not provide new
information. This problem has the following formulation:
 KDQL RULE Associations AS
SELECT DISTINCT 1..n item AS BODY,
1..1 item AS HEAD,
SUPPORT, CONFIDENCE
FROM Purchase
GROUP BY transaction
EXTRACTING RULES WITH SUPPORT: 0.1,
CONFIDENCE: 0.2
Implementation of KDQLImplementation of KDQL
31
Implementation ofImplementation of
KDQLKDQL
 < KDQL_RULES_OP > := KDD RULES < TableName > AS
SELECT DISTINCT < BodyDescr >, < HeadDescr >
[,SUPPORT] [,CONFIDENCE]
[WHERE < WhereClause >]
FROM < FromList > [WHERE < WhereClause >]
GROUP BY < Attribute > < AttributeList>
[HAVING < HavingClause > ]
[CLUSTER BY < Attribute> < AttributeList>
[HAVING < HavingClause > ]
EXTRACTING RULES WITH SUPPORT :< real >,
CONFIDENCE:<real>
 < Body_Description_KDQL>:= [< Cardinaly_Sheap > ] < AttrName > < AttrList > AS BODY
/* default cardinality sheap for the Body: 1..n */
< Head_Description_KDQL>:= [< Cardinaly_Sheap > ] < AttrName > < AttrList > AS HEAD
/* default cardinality shaep for the Head: 1..1 */
< Cardinaly_Sheap >:=< Number> .. (< Number> | n)
<AttributeList>:={<AttributeName>,<AttributeName>,…<AttributeName>}
KDQL rules operator
32
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implantation of KDQL.
 Conclusion.
 Appendix A , B.
33
ConclusionConclusion
 KDQL is a part of the
ODBC_KDD (2) model .
 KDQL calls I-extended
database via ODBC connection.
 I-extended database calls all the
requested information from
traditional databases via the
ODBC.
 KDQL was implemented to
handle DM task with
visualization.
 Visualization techniques can be
maintained to visualize
interesting association rules
discovered from the databases.
34
ResultsResults
The major results of the thesis work are summarized as follows.
 Proposing a new remote access KDD model called ODBC_KDD (2) to
build an attractive model that could get results with more detailed
description such as visualization, scripts, statistical inferences and
more.
 Proposing and implementing a database concept, called I-extended
database (I-ED) to be maintained and accelerated by the use of
Knowledge Discovery Query Language (KDQL).
 In ODBC_KDD (2) model we proposed a query language called
KDQL.KDQL was suggested to interact into the conceptual database
called I-extended database. KDQL is a result of a new KDD query
language which could discover association rules.
 Using visualization tools in KDQL to represent the retrieved data
results in different 2D and 3D visual forms such as pie, points, lines
and bars.
 Using support and confidence of data item to locate the important
associated rules from the databases by using I-extended database to be
established by KDQL.
35
Overview of the ThesisOverview of the Thesis
 Part I
 Introduction to Knowledge Discovery in Databases ( KDD) and Data
Mining (DM).
 Goal of the Thesis Work.
 Part 2
 Remote Access KDD models.
 Logical Foundation in Data Mining.
 Mining the Discovered Association Rules.
 Data Mining Query Languages.
 Part 3
 Knowledge Discovery Query Language ( KDQL).
 I-extended Databases (I-ED).
 Implementation of KDQL.
 Conclusion.
 Appendix A , B.
36
Appendix A , B
 We introduced the proposed syntax of the
KDQL statement rules.
Appendix A
Appendix B (Images from the program(
37
Dedications and AcknowledgmentsDedications and Acknowledgments
• First I want to thank my wife Emaan Zubi for her understanding and
making the last steps of writing this dissertation enjoyable and also my kids
Yhaia, Mohamed and Suliman for being nice kids while I’m doing this
work.
• My parents father: Suliman Zubi and Mother: Memona Yousef.
• I would like to thank Dr. Fazekas Gábor for accepting me as a Ph.D
student under his supervision. Also I would like to thank him for continuous
encouragement, confidence and support, reviewing the text of this thesis,
and for sharing with me his knowledge and love of this field .
• My senior supervisor Prof. Dr.Arató Mátyás for his encouragements.
• Dr.Kormos Janos, my teacher and friend, for his insightful comments ,
advice and help.
• Dr. Bajalinov Erik for the frequent constructive discussions regarding the
programming in Delphi.
• My deepest thanks to Dr.Varga Katalin and Dr.Várterész Magdolna for
refereeing my Ph.D dissertation work.
• Mr. Basheer Nassain the Libyan student advisor and Mr. Khalid Zintaney
the financial office in the Libyan Embassy, Budapest , for there support.
• All people in this committee.
• Finally I want to thank all my friends and people in the Institute of
Mathematical and Informatics, Debrecen University.
38
Thank you!!!
39
40

Más contenido relacionado

La actualidad más candente

IRJET- Swift Retrieval of DNA Databases by Aggregating Queries
IRJET- Swift Retrieval of DNA Databases by Aggregating QueriesIRJET- Swift Retrieval of DNA Databases by Aggregating Queries
IRJET- Swift Retrieval of DNA Databases by Aggregating QueriesIRJET Journal
 
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in CloudEnabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in CloudIOSR Journals
 
Ijsws14 423 (1)-paper-17-normalization of data in (1)
Ijsws14 423 (1)-paper-17-normalization of data in (1)Ijsws14 423 (1)-paper-17-normalization of data in (1)
Ijsws14 423 (1)-paper-17-normalization of data in (1)Raghavendra Pokuri
 
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...IJECEIAES
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplicationidescitation
 
Term Frequency and its Variants in Retrieval Models
Term Frequency and its Variants in Retrieval ModelsTerm Frequency and its Variants in Retrieval Models
Term Frequency and its Variants in Retrieval ModelsVenkatesh Vinayakarao
 
Linked open data it univ 22 nov 2012
Linked open data it univ 22 nov 2012Linked open data it univ 22 nov 2012
Linked open data it univ 22 nov 2012Kerstin Forsberg
 
Semi-automatic Discovery of Mappings Between Heterogeneous Data Warehouse Dim...
Semi-automatic Discovery of Mappings Between Heterogeneous Data Warehouse Dim...Semi-automatic Discovery of Mappings Between Heterogeneous Data Warehouse Dim...
Semi-automatic Discovery of Mappings Between Heterogeneous Data Warehouse Dim...IDES Editor
 
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
Performance Analysis of Hybrid Approach for Privacy Preserving in Data MiningPerformance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Miningidescitation
 
51 privacy-preserving-publication-of-set-valued-data
51 privacy-preserving-publication-of-set-valued-data51 privacy-preserving-publication-of-set-valued-data
51 privacy-preserving-publication-of-set-valued-datakarunyaieeeproj
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Cluster Based Access Privilege Management Scheme for Databases
Cluster Based Access Privilege Management Scheme for DatabasesCluster Based Access Privilege Management Scheme for Databases
Cluster Based Access Privilege Management Scheme for DatabasesEditor IJMTER
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining14894
 

La actualidad más candente (20)

Edi text
Edi textEdi text
Edi text
 
IRJET- Swift Retrieval of DNA Databases by Aggregating Queries
IRJET- Swift Retrieval of DNA Databases by Aggregating QueriesIRJET- Swift Retrieval of DNA Databases by Aggregating Queries
IRJET- Swift Retrieval of DNA Databases by Aggregating Queries
 
15 19
15 1915 19
15 19
 
WP4-QoS Management in the Cloud
WP4-QoS Management in the CloudWP4-QoS Management in the Cloud
WP4-QoS Management in the Cloud
 
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in CloudEnabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
 
Ijsws14 423 (1)-paper-17-normalization of data in (1)
Ijsws14 423 (1)-paper-17-normalization of data in (1)Ijsws14 423 (1)-paper-17-normalization of data in (1)
Ijsws14 423 (1)-paper-17-normalization of data in (1)
 
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
 
10420140501003
1042014050100310420140501003
10420140501003
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplication
 
Term Frequency and its Variants in Retrieval Models
Term Frequency and its Variants in Retrieval ModelsTerm Frequency and its Variants in Retrieval Models
Term Frequency and its Variants in Retrieval Models
 
Linked open data it univ 22 nov 2012
Linked open data it univ 22 nov 2012Linked open data it univ 22 nov 2012
Linked open data it univ 22 nov 2012
 
Semi-automatic Discovery of Mappings Between Heterogeneous Data Warehouse Dim...
Semi-automatic Discovery of Mappings Between Heterogeneous Data Warehouse Dim...Semi-automatic Discovery of Mappings Between Heterogeneous Data Warehouse Dim...
Semi-automatic Discovery of Mappings Between Heterogeneous Data Warehouse Dim...
 
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
Performance Analysis of Hybrid Approach for Privacy Preserving in Data MiningPerformance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
 
51 privacy-preserving-publication-of-set-valued-data
51 privacy-preserving-publication-of-set-valued-data51 privacy-preserving-publication-of-set-valued-data
51 privacy-preserving-publication-of-set-valued-data
 
1699 1704
1699 17041699 1704
1699 1704
 
Hu3414421448
Hu3414421448Hu3414421448
Hu3414421448
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
169 s170
169 s170169 s170
169 s170
 
Cluster Based Access Privilege Management Scheme for Databases
Cluster Based Access Privilege Management Scheme for DatabasesCluster Based Access Privilege Management Scheme for Databases
Cluster Based Access Privilege Management Scheme for Databases
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining
 

Destacado

Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)Zakaria Zubi
 
I- Extended Databases
I- Extended DatabasesI- Extended Databases
I- Extended DatabasesZakaria Zubi
 
Arabic Text mining Classification
Arabic Text mining Classification Arabic Text mining Classification
Arabic Text mining Classification Zakaria Zubi
 
COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA
COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA
COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA Zakaria Zubi
 
Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternZakaria Zubi
 

Destacado (7)

Ismail&&ziko 2003
Ismail&&ziko 2003Ismail&&ziko 2003
Ismail&&ziko 2003
 
Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)Knowledge Discovery Query Language (KDQL)
Knowledge Discovery Query Language (KDQL)
 
I- Extended Databases
I- Extended DatabasesI- Extended Databases
I- Extended Databases
 
Arabic Text mining Classification
Arabic Text mining Classification Arabic Text mining Classification
Arabic Text mining Classification
 
COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA
COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA
COMPARISON OF ROUTING PROTOCOLS FOR AD HOC WIRELESS NETWORK WITH MEDICAL DATA
 
Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime Pattern
 
Data mining
Data miningData mining
Data mining
 

Similar a Knowledge Discovery in Remote Access Databases

Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueMehmet Beyaz
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
Knowledge Discovery & Representation
Knowledge Discovery & RepresentationKnowledge Discovery & Representation
Knowledge Discovery & RepresentationDarshan Patil
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Andrea Scharnhorst
 
Association rule visualization technique
Association rule visualization techniqueAssociation rule visualization technique
Association rule visualization techniquemustafasmart
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)Kartik Kalpande Patil
 
Introduction to dm and dw
Introduction to dm and dwIntroduction to dm and dw
Introduction to dm and dwANUSUYA T K
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
 
(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdfPoornimaShetty27
 
(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdfSreenivasa Harish
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
Introduction to-data-mining chapter 1
Introduction to-data-mining  chapter 1Introduction to-data-mining  chapter 1
Introduction to-data-mining chapter 1Mahmoud Alfarra
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaIJERA Editor
 

Similar a Knowledge Discovery in Remote Access Databases (20)

Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining Technique
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Knowledge Discovery & Representation
Knowledge Discovery & RepresentationKnowledge Discovery & Representation
Knowledge Discovery & Representation
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
 
Association rule visualization technique
Association rule visualization techniqueAssociation rule visualization technique
Association rule visualization technique
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
 
Introduction to dm and dw
Introduction to dm and dwIntroduction to dm and dw
Introduction to dm and dw
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
Data mining
Data miningData mining
Data mining
 
Ck34520526
Ck34520526Ck34520526
Ck34520526
 
(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf
 
(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
R & Data mining in action
R & Data mining in actionR & Data mining in action
R & Data mining in action
 
Introduction to-data-mining chapter 1
Introduction to-data-mining  chapter 1Introduction to-data-mining  chapter 1
Introduction to-data-mining chapter 1
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
 

Más de Zakaria Zubi

applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...Zakaria Zubi
 
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...Zakaria Zubi
 
Applying web mining application for user behavior understanding
Applying web mining application for user behavior understandingApplying web mining application for user behavior understanding
Applying web mining application for user behavior understandingZakaria Zubi
 
Ibtc dwt hybrid coding of digital images
Ibtc dwt hybrid coding of digital imagesIbtc dwt hybrid coding of digital images
Ibtc dwt hybrid coding of digital imagesZakaria Zubi
 
Information communication technology in libya for educational purposes
Information communication technology in libya for educational purposesInformation communication technology in libya for educational purposes
Information communication technology in libya for educational purposesZakaria Zubi
 

Más de Zakaria Zubi (7)

applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
applyingwebminingapplicationforuserbehaviorunderstanding-131215105223-phpapp0...
 
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
 
Applying web mining application for user behavior understanding
Applying web mining application for user behavior understandingApplying web mining application for user behavior understanding
Applying web mining application for user behavior understanding
 
Model
ModelModel
Model
 
Ibtc dwt hybrid coding of digital images
Ibtc dwt hybrid coding of digital imagesIbtc dwt hybrid coding of digital images
Ibtc dwt hybrid coding of digital images
 
Deep Web mining
Deep Web miningDeep Web mining
Deep Web mining
 
Information communication technology in libya for educational purposes
Information communication technology in libya for educational purposesInformation communication technology in libya for educational purposes
Information communication technology in libya for educational purposes
 

Último

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Knowledge Discovery in Remote Access Databases

  • 1. Knowledge Discovery inKnowledge Discovery in Remote Access DatabasesRemote Access Databases A thesis submitted in partial fulfillment of the requirements for the degree ofA thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer ScienceDoctor of Philosophy in Computer Science at the Institute of Mathematics and Computer Science Informaticsat the Institute of Mathematics and Computer Science Informatics Debrecen of UniversityDebrecen of University By Zakaria Suliman ZubiBy Zakaria Suliman Zubi Supervised by Prof. Arato Matyas andSupervised by Prof. Arato Matyas and Prof.Fazekas GáborProf.Fazekas Gábor
  • 2. 2 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 3. 3 Introduction to KDDIntroduction to KDD and DMand DM  KDD is the process of extracting interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases.  DM is a single step in KDD process which deals with extracting trends or patterns from raw databases and carefully and accurately transforms them into useful and understandable information.  In the introduction part (chapter 1) I will follow the structure of expressing the History, Importance, Appearances and Tools for KDD and DM in all sections of the introduction part in this thesis. Is a phase in which noise data and irrelevant data are removed from the collection. Multiple data sources, often heterogeneous, may be combined in a common source. The data relevant to the analysis is decided on and retrieved from the data collection. It is a phase in which the selected data is transformed into forms appropriate for the mining procedure. It is the crucial step in which clever techniques are applied to extract patterns potentially useful information. Strictly interesting patterns representing knowledge are identified based on a given measures. In the final phase in which the discovered knowledge is visually represented to the user. KDD process
  • 4. 4 Introduction to KDDIntroduction to KDD and DMand DM KDD & DM shared with several topic
  • 5. 5 Introduction to KDDIntroduction to KDD and DMand DM  Access to databases was established via Open Database Connectivity (ODBC) .  Querying the databases can be maintained by Structured Query Language (SQL). The aim of using SQL is to allow users to define the data in databases and manipulate that data (adding, deleting and retrieving ) it from raw databases.  Using Data Visualization to represent Data Mining results.
  • 6. 6 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 7. 7 Goal of the Thesis WorkGoal of the Thesis Work  In this thesis work, we investigated the problem of matching DM problems with the set of DM algorithms that are suitable for solving it.  The use of visualization and its integration with algorithmic approaches to tune the parameters of DM algorithms, in order to support the parameter selection process, currently only explored by algorithmic approaches, in a more systematic form than using default values or setting parameter values without clues.  Introducing visualization to provide expressive information about induced models and statistics entities, and to support the interactive and dynamic exploration of induced models for DM.
  • 8. 8 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implantation of KDQL.  Conclusion.  Appendix A , B.
  • 9. 9 Remote Access KDD models Connection between KDD and ODBC
  • 12. 12 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 13. 13 Logical Foundation in Data Mining (LFDM)  Expressiveness :First order logic can represent more complex concepts than traditional attribute-value languages.  Readability : Formulae are easier to read than decision trees or a set of linear equations.  Background knowledge: Background knowledge can be grown during discovery time for example, in time series.  Multiple tables: Multiple database tables can be handled without explicit and expensive joins.  Deductive databases: Logical discovery engines can be transparently linked to relational databases via deductive databases. Advantages of Logical Foundation in Data Mining Disadvantages of Logical Foundation in Data Mining  Language complexity : First order hypothesis are usually constructed through heavy search ( discovery feasible).  Database access times: Checking one single candidate might involve heavy querying.  Number handling: Logical approaches to discovery usually suffer from poor number handling capabilities.
  • 14. 14 Translating first order queries into SQL  In our natural language a question such as “find all employers who are mangers and getting salary or expenses more than 1000000 HUF a year”:  expensive_employee(Name) ← employee(Name, Salary1, Manager),Salary1 > 1000000, employee(Manager, Salary2),Salary1 > Salary2  SELECT employee_0.NAME FROM employee employee_0, employee employee_1 WHERE employee_0.SALARY > 1000000 AND employee_1.NAME = employee_0.MANAGER AND employee_0.SALARY > employee_1.SALARY Logical Foundation in Data Mining (LFDM)
  • 15. 15 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 16. 16 Association Rules  What is an Association Rule? Association rule is a set of items T={ia,ib,..,it} T I, where I is the set of all possible items {i1,i2,…,in} in D the task relevant data, D is a set of transactions. An association rule is of the form : P  Q, where P I, Q I, and P Q =Ø. P Q holds in D with support s and P Q has a confidence c in the transaction set D  Example: “In 80% of the cases when people buy bread, they also buy milk” Bread ==> milk /80% Mining the DiscoveredMining the Discovered Association RulesAssociation Rules ⊂ ⊂ ⊂ ∩ y(Q/P)ProbabilitQ)(PConfidence =→ Q)y(PProbabilitQ)Support(P ∪=→
  • 17. 17 Mining the Association Rules  What is Mining the association rule? Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories. Selecting the most "interesting" rules based on their confidence factors. If holds in D with support s and has a confidence c in the transaction set D.  Applications: Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc.  Examples:  “Body → Head [support, confidence]”  buys(x, “bread”) → buys(x, “milk”) [6%, 65%]  major(x, “CS”) takes(x, “Database”) → grade(x, “5”) [1%, 75%] Mining the DiscoveredMining the Discovered Association RulesAssociation Rules
  • 18. 18  How do we Mine Association Rules?  Input :  A database of transactions.  Each transaction is a list of items (Ex. purchased by a customer in a visit).  Find all rules that associate the presence of one set of items with that of another set of items.  Example: “98% of people who purchase tires and auto accessories also get automotive services done”  There are no restrictions on number of items in the body of the rule. Mining the DiscoveredMining the Discovered Association RulesAssociation Rules Mining the Association Rules cont.
  • 19. 19 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 20. 20 What is Data Mining Query Language?  Data Mining Query Language (DMQL)Data Mining Query Language (DMQL): Is an iterative process to the KDD process, which discovered knowledge and presented the knowledge to the user, the evaluation measures can be enhanced, the mining can be further refined, new data can be selected or further transformed, or new data sources can be integrated, in order to get different, more appropriate results. Data Mining QueryData Mining Query Language (DMQL)Language (DMQL)
  • 21. 21 Types of discovered patterns by DMQL  Characterization: Data characterization is a summarization of general features of objects in a target class, and produces what is called characteristic rules.  Discrimination: Data discrimination produces what are called discriminant rules and is basically the comparison of the general features of objects between two classes referred to as the target class and the contrasting class.  Association analysis: Association analysis is the discovery of what are commonly called association rules.  Classification: Classification analysis is the organization of data in given classes.  Prediction: Prediction has attracted considerable attention given the potential implications of successful forecasting in a business context.  Clustering: clustering is the organization of data in classes.  Outlier analysis: Outliers are data elements that cannot be grouped in a given class or cluster.  Evolution and deviation analysis: Evolution and deviation analysis pertain to the study of time related data that changes in time. Data Mining QueryData Mining Query Language (DMQL)Language (DMQL)
  • 22. 22 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 23. 23 Knowledge Discovery QueryKnowledge Discovery Query Language ( KDQL)Language ( KDQL) What is KDQL in principle ?  Knowledge Discovery Query Language (KDQL) is a KDD query language suggested to the ODBC_KDD(2) model for mining the association rules in the databases (i.e. DBMS, relational database), and then to visualize the discovered results in different charts forms (i.e. 2D and 3D). KDQL was not implemented namely yet. In KDQL we join KDD technology and data visualization with conjunction of the request of creating query language for DM tasks. This leads us to develop a language tool that can handle two approaches in one session. RequestRequest DataData Data toData to VisualizeVisualize Visualization ToolVisualization Tool Database Management SystemDatabase Management System (DBMS(DBMS((
  • 24. 24 Visualization techniques for DMQL Data Mining QueryData Mining Query Language (DMQL)Language (DMQL) Visualization ToolsVisualization Tools Database Management SystemDatabase Management System (DBMS(DBMS(( Knowledge DiscoveryKnowledge Discovery Query Language ( KDQL)Query Language ( KDQL)
  • 25. 25 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 26. 26 Motivation  I-Extended DatabaseI-Extended Database : Is a database that in addition to data also contain exceedingly defined generalizations about the data. Moreover, I-extended database is a database that has similar properties that are in inductive database. We formalize this concept and show how it can be used throughout the whole process of DM due to the closure property of the framework.  The basic message in I-extended database is as follow:  I-extended database consists of a normal database associated to a subset of patterns from a class of patterns, and an evaluation function that tells how the patterns occur in the data.  I-extended database can be queried (in principle) just by using normal relational algebra or SQL, with the added property of being able to refer to the values of the evaluation function on the patterns.  Modeling KDD processes as a sequence of queries on i-extended database gives rise to chances for reasoning and optimizing these processes. I-Extended Databases (I-ED)I-Extended Databases (I-ED)
  • 27. 27 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 28. 28 Motivation of KDQL  The background of KDQL came from the Structured Query Language (SQL) since several extensions to the SQL have been proposed to serve as a Data Mining Query Language (DMQL). SQL + DM (rules) = is the appropriate form for this task on the user interface. DM (rules) is based on the association rules to interact I-extended database. The association rules will be obtained by the use of KDQL rules, and the results will be graphically represented in a 2D and 3D charts. Implementation of KDQLImplementation of KDQL
  • 29. 29 Architecture of KDQL Implementation of KDQLImplementation of KDQL
  • 30. 30 Example of KDQL  For example, the rule. { cheese, coke} ==> bread  States that if cheese and coke are bought together in a transaction, also bread is bought in the same transaction. In this association rules, the body is a set of items and the head is a single item. The rule {cheese, coke}==> cheese, is not interesting because it is a tautology: in fact if the head is implicated by the body the rule does not provide new information. This problem has the following formulation:  KDQL RULE Associations AS SELECT DISTINCT 1..n item AS BODY, 1..1 item AS HEAD, SUPPORT, CONFIDENCE FROM Purchase GROUP BY transaction EXTRACTING RULES WITH SUPPORT: 0.1, CONFIDENCE: 0.2 Implementation of KDQLImplementation of KDQL
  • 31. 31 Implementation ofImplementation of KDQLKDQL  < KDQL_RULES_OP > := KDD RULES < TableName > AS SELECT DISTINCT < BodyDescr >, < HeadDescr > [,SUPPORT] [,CONFIDENCE] [WHERE < WhereClause >] FROM < FromList > [WHERE < WhereClause >] GROUP BY < Attribute > < AttributeList> [HAVING < HavingClause > ] [CLUSTER BY < Attribute> < AttributeList> [HAVING < HavingClause > ] EXTRACTING RULES WITH SUPPORT :< real >, CONFIDENCE:<real>  < Body_Description_KDQL>:= [< Cardinaly_Sheap > ] < AttrName > < AttrList > AS BODY /* default cardinality sheap for the Body: 1..n */ < Head_Description_KDQL>:= [< Cardinaly_Sheap > ] < AttrName > < AttrList > AS HEAD /* default cardinality shaep for the Head: 1..1 */ < Cardinaly_Sheap >:=< Number> .. (< Number> | n) <AttributeList>:={<AttributeName>,<AttributeName>,…<AttributeName>} KDQL rules operator
  • 32. 32 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implantation of KDQL.  Conclusion.  Appendix A , B.
  • 33. 33 ConclusionConclusion  KDQL is a part of the ODBC_KDD (2) model .  KDQL calls I-extended database via ODBC connection.  I-extended database calls all the requested information from traditional databases via the ODBC.  KDQL was implemented to handle DM task with visualization.  Visualization techniques can be maintained to visualize interesting association rules discovered from the databases.
  • 34. 34 ResultsResults The major results of the thesis work are summarized as follows.  Proposing a new remote access KDD model called ODBC_KDD (2) to build an attractive model that could get results with more detailed description such as visualization, scripts, statistical inferences and more.  Proposing and implementing a database concept, called I-extended database (I-ED) to be maintained and accelerated by the use of Knowledge Discovery Query Language (KDQL).  In ODBC_KDD (2) model we proposed a query language called KDQL.KDQL was suggested to interact into the conceptual database called I-extended database. KDQL is a result of a new KDD query language which could discover association rules.  Using visualization tools in KDQL to represent the retrieved data results in different 2D and 3D visual forms such as pie, points, lines and bars.  Using support and confidence of data item to locate the important associated rules from the databases by using I-extended database to be established by KDQL.
  • 35. 35 Overview of the ThesisOverview of the Thesis  Part I  Introduction to Knowledge Discovery in Databases ( KDD) and Data Mining (DM).  Goal of the Thesis Work.  Part 2  Remote Access KDD models.  Logical Foundation in Data Mining.  Mining the Discovered Association Rules.  Data Mining Query Languages.  Part 3  Knowledge Discovery Query Language ( KDQL).  I-extended Databases (I-ED).  Implementation of KDQL.  Conclusion.  Appendix A , B.
  • 36. 36 Appendix A , B  We introduced the proposed syntax of the KDQL statement rules. Appendix A Appendix B (Images from the program(
  • 37. 37 Dedications and AcknowledgmentsDedications and Acknowledgments • First I want to thank my wife Emaan Zubi for her understanding and making the last steps of writing this dissertation enjoyable and also my kids Yhaia, Mohamed and Suliman for being nice kids while I’m doing this work. • My parents father: Suliman Zubi and Mother: Memona Yousef. • I would like to thank Dr. Fazekas Gábor for accepting me as a Ph.D student under his supervision. Also I would like to thank him for continuous encouragement, confidence and support, reviewing the text of this thesis, and for sharing with me his knowledge and love of this field . • My senior supervisor Prof. Dr.Arató Mátyás for his encouragements. • Dr.Kormos Janos, my teacher and friend, for his insightful comments , advice and help. • Dr. Bajalinov Erik for the frequent constructive discussions regarding the programming in Delphi. • My deepest thanks to Dr.Varga Katalin and Dr.Várterész Magdolna for refereeing my Ph.D dissertation work. • Mr. Basheer Nassain the Libyan student advisor and Mr. Khalid Zintaney the financial office in the Libyan Embassy, Budapest , for there support. • All people in this committee. • Finally I want to thank all my friends and people in the Institute of Mathematical and Informatics, Debrecen University.
  • 39. 39
  • 40. 40