SlideShare una empresa de Scribd logo
1 de 1
Descargar para leer sin conexión
A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM FOR
HIGH-DIMENSIONAL DATA
ABSTRACT:
Feature selection involves identifying a subset of the most useful features that produces
compatible results as the original entire set of features. A feature selection algorithm may be
evaluated from both the efficiency and effectiveness points of view. While the efficiency
concerns the time required to find a subset of features, the effectiveness is related to the quality
of the subset of features. Based on these criteria, a fast clustering-based feature selection
algorithm (FAST) is proposed and experimentally evaluated in this paper.
The FAST algorithm works in two steps.
In the first step, features are divided into clusters by using graph-theoretic clustering methods.
In the second step, the most representative feature that is strongly related to target classes is
selected from each cluster to form a subset of features.
Features in different clusters are relatively independent; the clustering-based strategy of FAST
has a high probability of producing a subset of useful and independent features. To ensure the
efficiency of FAST, we adopt the efficient minimum-spanning tree (MST) clustering method.
The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical
study. Extensive experiments are carried out to compare FAST and several representative feature
selection algorithms results, on 35 publicly available real-world high-dimensional image,
microarray, and text data, demonstrate that the FAST not only produces smaller subsets of
features but also improves the performances of the four types of classifiers.
ECWAY TECHNOLOGIES
IEEE PROJECTS & SOFTWARE DEVELOPMENTS
OUR OFFICES @ CHENNAI / TRICHY / KARUR / ERODE / MADURAI / SALEM / COIMBATORE
CELL: +91 98949 17187, +91 875487 2111 / 3111 / 4111 / 5111 / 6111
VISIT: www.ecwayprojects.com MAIL TO: ecwaytechnologies@gmail.com

Más contenido relacionado

Destacado

Tastemaking Through Curated Communities
Tastemaking Through Curated CommunitiesTastemaking Through Curated Communities
Tastemaking Through Curated CommunitiesMediabistro
 
Capabilities Pres
Capabilities PresCapabilities Pres
Capabilities PresChappy5
 
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...eventi-ITBbari
 
A2 Communications Application Server
A2 Communications Application ServerA2 Communications Application Server
A2 Communications Application ServerGENBANDcorporate
 
Simulink Model for Cost-effective Analysis of Hybrid System
Simulink Model for Cost-effective Analysis of Hybrid SystemSimulink Model for Cost-effective Analysis of Hybrid System
Simulink Model for Cost-effective Analysis of Hybrid SystemIJMER
 
optical transmission system
optical transmission systemoptical transmission system
optical transmission systemsangavi43
 
Pittcon 2012 Cs Fi Invited Sers Talks Drugs In Saliva
Pittcon 2012 Cs Fi Invited Sers Talks Drugs In SalivaPittcon 2012 Cs Fi Invited Sers Talks Drugs In Saliva
Pittcon 2012 Cs Fi Invited Sers Talks Drugs In Salivainscore
 
Expectation pressure and stress
Expectation pressure and stressExpectation pressure and stress
Expectation pressure and stressSHIVI JAIN
 
1)l'importanza dei numeri
1)l'importanza dei numeri1)l'importanza dei numeri
1)l'importanza dei numericiao2222
 
Promise08 Wrapup
Promise08 WrapupPromise08 Wrapup
Promise08 Wrapupgregoryg
 

Destacado (17)

Tastemaking Through Curated Communities
Tastemaking Through Curated CommunitiesTastemaking Through Curated Communities
Tastemaking Through Curated Communities
 
Capabilities Pres
Capabilities PresCapabilities Pres
Capabilities Pres
 
Plural1
Plural1Plural1
Plural1
 
Mobile Security
Mobile Security Mobile Security
Mobile Security
 
ChemSpider hosting linking and curating chemistry data for the community
ChemSpider  hosting linking and curating chemistry data for the communityChemSpider  hosting linking and curating chemistry data for the community
ChemSpider hosting linking and curating chemistry data for the community
 
NUEVA HERRAMIENTA PEDAGÓGICA PARA LA ENSEÑANZA DE LA DESTILACIÓN
NUEVA HERRAMIENTA PEDAGÓGICA PARA LA ENSEÑANZA DE LA DESTILACIÓNNUEVA HERRAMIENTA PEDAGÓGICA PARA LA ENSEÑANZA DE LA DESTILACIÓN
NUEVA HERRAMIENTA PEDAGÓGICA PARA LA ENSEÑANZA DE LA DESTILACIÓN
 
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
Michelangelo Ceci – Tecniche di data-mining per la caratterizzazione di entit...
 
A2 Communications Application Server
A2 Communications Application ServerA2 Communications Application Server
A2 Communications Application Server
 
Simulink Model for Cost-effective Analysis of Hybrid System
Simulink Model for Cost-effective Analysis of Hybrid SystemSimulink Model for Cost-effective Analysis of Hybrid System
Simulink Model for Cost-effective Analysis of Hybrid System
 
optical transmission system
optical transmission systemoptical transmission system
optical transmission system
 
Pittcon 2012 Cs Fi Invited Sers Talks Drugs In Saliva
Pittcon 2012 Cs Fi Invited Sers Talks Drugs In SalivaPittcon 2012 Cs Fi Invited Sers Talks Drugs In Saliva
Pittcon 2012 Cs Fi Invited Sers Talks Drugs In Saliva
 
Expectation pressure and stress
Expectation pressure and stressExpectation pressure and stress
Expectation pressure and stress
 
1)l'importanza dei numeri
1)l'importanza dei numeri1)l'importanza dei numeri
1)l'importanza dei numeri
 
20320140502005
2032014050200520320140502005
20320140502005
 
Promise08 Wrapup
Promise08 WrapupPromise08 Wrapup
Promise08 Wrapup
 
Radio
RadioRadio
Radio
 
585 589
585 589585 589
585 589
 

Java a fast clustering-based feature subset selection algorithm for high-dimensional data

  • 1. A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM FOR HIGH-DIMENSIONAL DATA ABSTRACT: Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based feature selection algorithm (FAST) is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent; the clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree (MST) clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study. Extensive experiments are carried out to compare FAST and several representative feature selection algorithms results, on 35 publicly available real-world high-dimensional image, microarray, and text data, demonstrate that the FAST not only produces smaller subsets of features but also improves the performances of the four types of classifiers. ECWAY TECHNOLOGIES IEEE PROJECTS & SOFTWARE DEVELOPMENTS OUR OFFICES @ CHENNAI / TRICHY / KARUR / ERODE / MADURAI / SALEM / COIMBATORE CELL: +91 98949 17187, +91 875487 2111 / 3111 / 4111 / 5111 / 6111 VISIT: www.ecwayprojects.com MAIL TO: ecwaytechnologies@gmail.com