SlideShare a Scribd company logo
1 of 8
Surname 1

Name
Instructor’s Name
Course
10/13/13
Implementation of Ontology Based Clustering Algorithms
In data mining, clustering is predominantly used due to its wide variety of applications.
Many data clustering algorithms have been introduced in recent years to be used in variety of
areas which include economics, medicine, computational biology, image processing and mobile
communication(Hartigan). Despite its wide use, standardization of the clustering algorithms is
it’sthe major setback. An algorithm developed for a given type of data set may produce very
poor results in different data sets(Hartigan). Despite huge efforts in trying to standardize
clustering algorithms, there has been no major accomplishment. Efforts to standardize clustering
algorithms has resulting into the proposal of many algorithms with each having its advantages
and disadvantages.
Clusteringrefers to the process of partitioning given data sets to a homogenous data set group.
This is basedon features such that objects that tend to be similar are classified into one group and
objects that tend to be dissimilar are classified into a different group(Maulik, Bandyopadhyay
and Mukhopadhyay).
Ontology refers to formal specifications that are used in conceptualization(Punitha, Thavavel and
Punithavalli). This specifications of conceptualization assist the system administrators and
programs share information. Ontology comprises set of relationships and events that are utilized
Surname 2

in creation of sets of vocabulary for exchanging information between programs and/or system
administrators. Ontology is based on the concept that different system administrators have
different requirements regarding clustering of algorithms(Punitha, Thavavel and Punithavalli).
Empirical and mathematical clustering are utilized in high dimensional space to categorize
information into one cluster as required. Cluster analysis is aimed at dividing sets of objects into
one homogenous cluster(Gavrilova).
There are various general steps followed during implementation of the ontology – based
clustering algorithms(Canadian Society for Computational Studies of Intelligence). The first step
involves calculating distance matrix and/or similarity matrix existing between pair of objects by
use of specific methods that are ontology based. In this stage, each object consists distinct
clusters. The second step involves merging the two closest clusters by use of the distance matrix.
This process is also referred to as the clustering process. The third step involves rebuilding and
modifying the distance matrix. This is done by the treating each of the merged clusters as one
entity. To achieve this, ways of calculating similarities between clusters and objects and ways
that estimates the similarity existing between ontology objects and clusters are utilized. This is
also known as the evaluation process(Punitha, Thavavel and Punithavalli). Fourthly, in case the
required number of clusters have been identified, the process is altered, otherwise, the process is
repeated from the second stage. To calculate the similarities between objects, the equation below
is utilized;

In the above equation, TS represents taxonomy similarity, AS represents the attribute similarity
and, RS represents relationship similarity. The taxonomy similarity (TS) are the dissimilarity or
Surname 3

similarity existing between the classes of the objects (Punitha, Thavavel and Punithavalli). There
are various ontology – based clustering algorithms. Some of them are as discussed below.
Ontology based apriori clustering algorithm
Apriori algorithm is the most influential algorithm used in mining frequent Item sets for Boolean
association rules. The generation of association rule in apriori algorithm is achieved through two
processes. Initially, there is application of minimum support to facilitate finding of all frequent
items in the given database. The second process involves the use of the frequent item sets and
minimum confidence constraints to construct rules. Basically, the apriori algorithm is aimed at
generating item sets of a given size and then scanning the database for comparison to see if the
generated items sets are large. Only the item sets that are large are utilized in generating item sets
for the next scanning process. An example of implementation of apriri algorithm is as shown
below;
Input: The data set – Clustered Dataset (CD)
CD (r1,r2,r3,r4,……,rn-1,rn)T;
The feature set – Clustered Features (CF)
CF{f1,f2,f3,f4,……,fm-1,fm};
The ontology feature tree – Ontology-Tree(OT)
Ontology-Tree = {<Cpt.root,partOf,Cpt.1>,
<Cpt.root,partOf,Cpt.2>, <Cpt.1,propertyOf, Ab.1>, <Cpt.1,propertyOf,
Ab.2,……}.
Output: feature weight set
Weight={w1,w2,w3,w4,…,wm-1,wm};
The Clustering results – Clusters (C)
Clusters{c1,c2,c3,c4,……,ck-1,ck} .
Procedures
BEGIN
Input <– CF{f1,f2,f3,f4,……,fm-1,fm }
for a=0 to n:
for b=0 to m:
Correlation(Ab.a,Ab.b)
endfor
endfor
Surname 4
for a=0 to m:
Weight(Ab.a) ←(∑nb=0Correlation(Ab.a, Ab.b))/n.
endfor
do
begin
forint a to n do:
L2 = (∑mk-1Weight(Ab.k)( fieldka− fieldkb)2)1/ 2 .
endfor
end
while(Clusters center does not Change) :
Output –>Clusters{c1,c2,c3,c4,……,ck-1,ck} .
END

Ontology-based Clustering Algorithm with Feature Weights
This algorithm entails using the domain knowledge introduced by ontology in calculating the
clustering features weight. The building of the domain ontology tree is done using expert
knowledge basing on the given data collection (ZHANG and WANG). To get ontology concepts,
deeper abstractions are utilized.
Ontology based FP – Growth Based Clustering
This type of ontology based algorithm carries out two scans on the given database. At first, a list
of frequent items are computed and sorted by frequency of appearance in a descending order.
Secondly, there is compression of the database into an FP – tree. Recursive mining is then
performed on the FP – Tree.
Swoogle
Most of web semantic data is highly distributed thus this facilitates easy and faster access of
data on the web. The most challenging is how to facilitate for quality data to users and also and
Surname 5

relevant domain ontologies and choosing the trusted one. Swoogle therefore are developed to
provide the user or the agent with the semantic web search and navigation framework. It uses
three kinds of paths to achieve this, interesource path which enhances links between Semantic
Web Terms (SWTs), resource document paths which provides usage links between the Semantic
Web Documents (SWDs) and SWTs and finally Inter document paths which manifest explicit
link between Semantic Web Documents.
PowerAqua
It’s a type of algorithm that supports the users in querying and exploring semantic web content.
It answers queries by integrating and locating data or information which is normally distributed
within the large scale information in the semantic web documents. The information which is
heterogeneous in the semantic it’s fully exploited using this mechanism to provide knowledge
fusion, query disambiguation as well as ranking mechanisms so as to provide the user with the
most answers for the queries. This approach evolved from the AquaLog, which is a system for
intranets limited to use one ontology at a time.
Ontokhoj
It’s an algorithm used for ranking, searching, classifying and aggregating ontologies crawled
from the Semantic Web. This allows ontology engineers and agents to retrieve authoritative,
knowledge and expedite the process of ontology engineering through extensive reuse of
ontologies.
Just like Swoogle, Ontokhoj uses ranking algorithms similar to those used by Google to give
more importance to ontologies that are highly connected. This thus helps us to find more
appropriate ontology to use in the ranking techniques attached to semantic web.
Surname 6

Conclusion
The quantity of information being made available for use both offline and online is ever
increasing. This has been a major concern to ensure that access to this information is viable by
developing ways to filter out unwanted content and making available only viable content. To
achieve these, ontology based algorithms have arisen and it has proved to be a cutting edge
technology that makes it possible to access this information.
Surname 7
1.

Works Cited

Canadian Society for Computational Studies of Intelligence. Advances in Artificial Intelligence.
Victoria: Springer, 2005. Print.
Gavrilova, Marina L. Computational Science and Its Applications - ICCSA 2006. Springer, 2006.
Print.
Hartigan, John A. Clustering algorithms. Wiley, 1975. Print.
Maulik, Ujjwal, Sanghamitra Bandyopadhyay and Anirban Mukhopadhyay. Multiobjective
Genetic Algorithms for Clustering: Applications in Data Mining and Bioinformatics.
Springer, 2011. Print.
Punitha, S.C., V. Thavavel and M. Punithavalli. "Information Ttechnology Journal." A New
Hybrid Schemes Combining Ontology and Clustering for Text Documents (2013): 2443 2453. Print.
Wasilewska, Anita. APRIORI Algorithm. n.d. ELectronic.
ZHANG, Lei and Zhichao WANG. "Journal of Computational Information Systems." Ontologybased Clustering Algorithm with Feature Weights (2010): 2959-2966. Electronic.
Surname 8

More Related Content

What's hot

2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
Michael Atkins
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplication
idescitation
 
Query expansion_group42_ire
Query expansion_group42_ireQuery expansion_group42_ire
Query expansion_group42_ire
KovidaN
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
IJCSIS Research Publications
 
Materials Informatics Overview
Materials Informatics OverviewMaterials Informatics Overview
Materials Informatics Overview
Tony Fast
 

What's hot (20)

Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge Graphs
 
Survey on Efficient Techniques of Text Mining
Survey on Efficient Techniques of Text MiningSurvey on Efficient Techniques of Text Mining
Survey on Efficient Techniques of Text Mining
 
Apresent
ApresentApresent
Apresent
 
Automatic and unsupervised topic discovery in social networks
Automatic and unsupervised topic discovery in social networksAutomatic and unsupervised topic discovery in social networks
Automatic and unsupervised topic discovery in social networks
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data sets
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplication
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Supporting scientific discovery through linkages of literature and data
Supporting scientific discovery through linkages of literature and dataSupporting scientific discovery through linkages of literature and data
Supporting scientific discovery through linkages of literature and data
 
Query expansion_group42_ire
Query expansion_group42_ireQuery expansion_group42_ire
Query expansion_group42_ire
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
 
Fasta
FastaFasta
Fasta
 
A study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanismsA study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanisms
 
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data StreamsNovel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
 
Materials Informatics Overview
Materials Informatics OverviewMaterials Informatics Overview
Materials Informatics Overview
 
Friday talk 11.02.2011
Friday talk 11.02.2011Friday talk 11.02.2011
Friday talk 11.02.2011
 
Frequent Item Set Mining - A Review
Frequent Item Set Mining - A ReviewFrequent Item Set Mining - A Review
Frequent Item Set Mining - A Review
 
K355662
K355662K355662
K355662
 

Similar to Ontology based clustering algorithms

(2016)application of parallel glowworm swarm optimization algorithm for data ...
(2016)application of parallel glowworm swarm optimization algorithm for data ...(2016)application of parallel glowworm swarm optimization algorithm for data ...
(2016)application of parallel glowworm swarm optimization algorithm for data ...
Akram Pasha
 
Incremental learning from unbalanced data with concept class, concept drift a...
Incremental learning from unbalanced data with concept class, concept drift a...Incremental learning from unbalanced data with concept class, concept drift a...
Incremental learning from unbalanced data with concept class, concept drift a...
IJDKP
 
Recommendation system using unsupervised machine learning algorithm & assoc
Recommendation system using unsupervised machine learning algorithm & assocRecommendation system using unsupervised machine learning algorithm & assoc
Recommendation system using unsupervised machine learning algorithm & assoc
ijerd
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Mining
butest
 

Similar to Ontology based clustering algorithms (20)

IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
Final proj 2 (1)
Final proj 2 (1)Final proj 2 (1)
Final proj 2 (1)
 
B.3.5
B.3.5B.3.5
B.3.5
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
Discovering latent informaion by
Discovering latent informaion byDiscovering latent informaion by
Discovering latent informaion by
 
(2016)application of parallel glowworm swarm optimization algorithm for data ...
(2016)application of parallel glowworm swarm optimization algorithm for data ...(2016)application of parallel glowworm swarm optimization algorithm for data ...
(2016)application of parallel glowworm swarm optimization algorithm for data ...
 
A Soft Set-based Co-occurrence for Clustering Web User Transactions
A Soft Set-based Co-occurrence for Clustering Web User TransactionsA Soft Set-based Co-occurrence for Clustering Web User Transactions
A Soft Set-based Co-occurrence for Clustering Web User Transactions
 
Incremental learning from unbalanced data with concept class, concept drift a...
Incremental learning from unbalanced data with concept class, concept drift a...Incremental learning from unbalanced data with concept class, concept drift a...
Incremental learning from unbalanced data with concept class, concept drift a...
 
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHTEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
 
A model of hybrid genetic algorithm particle swarm optimization(hgapso) based...
A model of hybrid genetic algorithm particle swarm optimization(hgapso) based...A model of hybrid genetic algorithm particle swarm optimization(hgapso) based...
A model of hybrid genetic algorithm particle swarm optimization(hgapso) based...
 
A model of hybrid genetic algorithm particle swarm optimization(hgapso) based...
A model of hybrid genetic algorithm particle swarm optimization(hgapso) based...A model of hybrid genetic algorithm particle swarm optimization(hgapso) based...
A model of hybrid genetic algorithm particle swarm optimization(hgapso) based...
 
A model of hybrid genetic algorithm particle swarm optimization(hgapso) based...
A model of hybrid genetic algorithm particle swarm optimization(hgapso) based...A model of hybrid genetic algorithm particle swarm optimization(hgapso) based...
A model of hybrid genetic algorithm particle swarm optimization(hgapso) based...
 
Recommendation system using unsupervised machine learning algorithm & assoc
Recommendation system using unsupervised machine learning algorithm & assocRecommendation system using unsupervised machine learning algorithm & assoc
Recommendation system using unsupervised machine learning algorithm & assoc
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
 
G045033841
G045033841G045033841
G045033841
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Mining
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Ontology based clustering algorithms

  • 1. Surname 1 Name Instructor’s Name Course 10/13/13 Implementation of Ontology Based Clustering Algorithms In data mining, clustering is predominantly used due to its wide variety of applications. Many data clustering algorithms have been introduced in recent years to be used in variety of areas which include economics, medicine, computational biology, image processing and mobile communication(Hartigan). Despite its wide use, standardization of the clustering algorithms is it’sthe major setback. An algorithm developed for a given type of data set may produce very poor results in different data sets(Hartigan). Despite huge efforts in trying to standardize clustering algorithms, there has been no major accomplishment. Efforts to standardize clustering algorithms has resulting into the proposal of many algorithms with each having its advantages and disadvantages. Clusteringrefers to the process of partitioning given data sets to a homogenous data set group. This is basedon features such that objects that tend to be similar are classified into one group and objects that tend to be dissimilar are classified into a different group(Maulik, Bandyopadhyay and Mukhopadhyay). Ontology refers to formal specifications that are used in conceptualization(Punitha, Thavavel and Punithavalli). This specifications of conceptualization assist the system administrators and programs share information. Ontology comprises set of relationships and events that are utilized
  • 2. Surname 2 in creation of sets of vocabulary for exchanging information between programs and/or system administrators. Ontology is based on the concept that different system administrators have different requirements regarding clustering of algorithms(Punitha, Thavavel and Punithavalli). Empirical and mathematical clustering are utilized in high dimensional space to categorize information into one cluster as required. Cluster analysis is aimed at dividing sets of objects into one homogenous cluster(Gavrilova). There are various general steps followed during implementation of the ontology – based clustering algorithms(Canadian Society for Computational Studies of Intelligence). The first step involves calculating distance matrix and/or similarity matrix existing between pair of objects by use of specific methods that are ontology based. In this stage, each object consists distinct clusters. The second step involves merging the two closest clusters by use of the distance matrix. This process is also referred to as the clustering process. The third step involves rebuilding and modifying the distance matrix. This is done by the treating each of the merged clusters as one entity. To achieve this, ways of calculating similarities between clusters and objects and ways that estimates the similarity existing between ontology objects and clusters are utilized. This is also known as the evaluation process(Punitha, Thavavel and Punithavalli). Fourthly, in case the required number of clusters have been identified, the process is altered, otherwise, the process is repeated from the second stage. To calculate the similarities between objects, the equation below is utilized; In the above equation, TS represents taxonomy similarity, AS represents the attribute similarity and, RS represents relationship similarity. The taxonomy similarity (TS) are the dissimilarity or
  • 3. Surname 3 similarity existing between the classes of the objects (Punitha, Thavavel and Punithavalli). There are various ontology – based clustering algorithms. Some of them are as discussed below. Ontology based apriori clustering algorithm Apriori algorithm is the most influential algorithm used in mining frequent Item sets for Boolean association rules. The generation of association rule in apriori algorithm is achieved through two processes. Initially, there is application of minimum support to facilitate finding of all frequent items in the given database. The second process involves the use of the frequent item sets and minimum confidence constraints to construct rules. Basically, the apriori algorithm is aimed at generating item sets of a given size and then scanning the database for comparison to see if the generated items sets are large. Only the item sets that are large are utilized in generating item sets for the next scanning process. An example of implementation of apriri algorithm is as shown below; Input: The data set – Clustered Dataset (CD) CD (r1,r2,r3,r4,……,rn-1,rn)T; The feature set – Clustered Features (CF) CF{f1,f2,f3,f4,……,fm-1,fm}; The ontology feature tree – Ontology-Tree(OT) Ontology-Tree = {<Cpt.root,partOf,Cpt.1>, <Cpt.root,partOf,Cpt.2>, <Cpt.1,propertyOf, Ab.1>, <Cpt.1,propertyOf, Ab.2,……}. Output: feature weight set Weight={w1,w2,w3,w4,…,wm-1,wm}; The Clustering results – Clusters (C) Clusters{c1,c2,c3,c4,……,ck-1,ck} . Procedures BEGIN Input <– CF{f1,f2,f3,f4,……,fm-1,fm } for a=0 to n: for b=0 to m: Correlation(Ab.a,Ab.b) endfor endfor
  • 4. Surname 4 for a=0 to m: Weight(Ab.a) ←(∑nb=0Correlation(Ab.a, Ab.b))/n. endfor do begin forint a to n do: L2 = (∑mk-1Weight(Ab.k)( fieldka− fieldkb)2)1/ 2 . endfor end while(Clusters center does not Change) : Output –>Clusters{c1,c2,c3,c4,……,ck-1,ck} . END Ontology-based Clustering Algorithm with Feature Weights This algorithm entails using the domain knowledge introduced by ontology in calculating the clustering features weight. The building of the domain ontology tree is done using expert knowledge basing on the given data collection (ZHANG and WANG). To get ontology concepts, deeper abstractions are utilized. Ontology based FP – Growth Based Clustering This type of ontology based algorithm carries out two scans on the given database. At first, a list of frequent items are computed and sorted by frequency of appearance in a descending order. Secondly, there is compression of the database into an FP – tree. Recursive mining is then performed on the FP – Tree. Swoogle Most of web semantic data is highly distributed thus this facilitates easy and faster access of data on the web. The most challenging is how to facilitate for quality data to users and also and
  • 5. Surname 5 relevant domain ontologies and choosing the trusted one. Swoogle therefore are developed to provide the user or the agent with the semantic web search and navigation framework. It uses three kinds of paths to achieve this, interesource path which enhances links between Semantic Web Terms (SWTs), resource document paths which provides usage links between the Semantic Web Documents (SWDs) and SWTs and finally Inter document paths which manifest explicit link between Semantic Web Documents. PowerAqua It’s a type of algorithm that supports the users in querying and exploring semantic web content. It answers queries by integrating and locating data or information which is normally distributed within the large scale information in the semantic web documents. The information which is heterogeneous in the semantic it’s fully exploited using this mechanism to provide knowledge fusion, query disambiguation as well as ranking mechanisms so as to provide the user with the most answers for the queries. This approach evolved from the AquaLog, which is a system for intranets limited to use one ontology at a time. Ontokhoj It’s an algorithm used for ranking, searching, classifying and aggregating ontologies crawled from the Semantic Web. This allows ontology engineers and agents to retrieve authoritative, knowledge and expedite the process of ontology engineering through extensive reuse of ontologies. Just like Swoogle, Ontokhoj uses ranking algorithms similar to those used by Google to give more importance to ontologies that are highly connected. This thus helps us to find more appropriate ontology to use in the ranking techniques attached to semantic web.
  • 6. Surname 6 Conclusion The quantity of information being made available for use both offline and online is ever increasing. This has been a major concern to ensure that access to this information is viable by developing ways to filter out unwanted content and making available only viable content. To achieve these, ontology based algorithms have arisen and it has proved to be a cutting edge technology that makes it possible to access this information.
  • 7. Surname 7 1. Works Cited Canadian Society for Computational Studies of Intelligence. Advances in Artificial Intelligence. Victoria: Springer, 2005. Print. Gavrilova, Marina L. Computational Science and Its Applications - ICCSA 2006. Springer, 2006. Print. Hartigan, John A. Clustering algorithms. Wiley, 1975. Print. Maulik, Ujjwal, Sanghamitra Bandyopadhyay and Anirban Mukhopadhyay. Multiobjective Genetic Algorithms for Clustering: Applications in Data Mining and Bioinformatics. Springer, 2011. Print. Punitha, S.C., V. Thavavel and M. Punithavalli. "Information Ttechnology Journal." A New Hybrid Schemes Combining Ontology and Clustering for Text Documents (2013): 2443 2453. Print. Wasilewska, Anita. APRIORI Algorithm. n.d. ELectronic. ZHANG, Lei and Zhichao WANG. "Journal of Computational Information Systems." Ontologybased Clustering Algorithm with Feature Weights (2010): 2959-2966. Electronic.