SlideShare una empresa de Scribd logo
1 de 22
An  E mpirical  C omparison   of F ast and  E fficient  T ools   for  M ining  T extual  D ata Volkan TUNALI (Marmara University) A. Yılmaz ÇAMURCU (Marmara University) T. Tugay BİLGİN (Maltepe University)
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Text Mining ,[object Object],[object Object],[object Object]
Document Clustering ,[object Object],[object Object],[object Object],[object Object]
K-means ,[object Object],[object Object],[object Object],[object Object],[object Object]
K-means Variants ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Spherical K-means ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bisecting K-means ,[object Object],[object Object],[object Object],[object Object]
Powerful Text Clustering Tools ,[object Object],[object Object],[object Object],[object Object]
Cluto ,[object Object],[object Object],[object Object],[object Object]
Gmeans ,[object Object],[object Object],[object Object]
Datasets Used in Experiments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Performance Evaluation Metrics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Memory Consumption
CPU Time Consumption
Purity
Entropy
F-Measure
Normalized Mutual Information
Comparison: Cluto vs. Gmeans ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Thank you! ,[object Object]

Más contenido relacionado

La actualidad más candente

BigData_MultiDimensional_CaseStudy
BigData_MultiDimensional_CaseStudyBigData_MultiDimensional_CaseStudy
BigData_MultiDimensional_CaseStudy
vincentlaulagnet
 
Dotnet distributed processing of probabilistic top-k queries in wireless sen...
Dotnet  distributed processing of probabilistic top-k queries in wireless sen...Dotnet  distributed processing of probabilistic top-k queries in wireless sen...
Dotnet distributed processing of probabilistic top-k queries in wireless sen...
Ecway Technologies
 
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
grssieee
 
Popescu.ppt
Popescu.pptPopescu.ppt
Popescu.ppt
grssieee
 
Lexie.IGARSS11.v3.pptx
Lexie.IGARSS11.v3.pptxLexie.IGARSS11.v3.pptx
Lexie.IGARSS11.v3.pptx
grssieee
 

La actualidad más candente (15)

BigData_MultiDimensional_CaseStudy
BigData_MultiDimensional_CaseStudyBigData_MultiDimensional_CaseStudy
BigData_MultiDimensional_CaseStudy
 
IEEE 2014 NS2 NETWORKING PROJECTS Proportional fair coding for wireless mesh...
IEEE 2014 NS2 NETWORKING PROJECTS  Proportional fair coding for wireless mesh...IEEE 2014 NS2 NETWORKING PROJECTS  Proportional fair coding for wireless mesh...
IEEE 2014 NS2 NETWORKING PROJECTS Proportional fair coding for wireless mesh...
 
MUMS Opening Workshop - Panel on Materials Presentation - Michael Demkowicz ,...
MUMS Opening Workshop - Panel on Materials Presentation - Michael Demkowicz ,...MUMS Opening Workshop - Panel on Materials Presentation - Michael Demkowicz ,...
MUMS Opening Workshop - Panel on Materials Presentation - Michael Demkowicz ,...
 
An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013
 
M phil-computer-science-machine-language-and-pattern-analysis-projects
M phil-computer-science-machine-language-and-pattern-analysis-projectsM phil-computer-science-machine-language-and-pattern-analysis-projects
M phil-computer-science-machine-language-and-pattern-analysis-projects
 
Rethinking data intensive science using scalable analytics systems
 Rethinking data intensive science using scalable analytics systems Rethinking data intensive science using scalable analytics systems
Rethinking data intensive science using scalable analytics systems
 
140829+an+empirical+study+of+the+impact+of+cloud+patterns+on+quality+of+servi...
140829+an+empirical+study+of+the+impact+of+cloud+patterns+on+quality+of+servi...140829+an+empirical+study+of+the+impact+of+cloud+patterns+on+quality+of+servi...
140829+an+empirical+study+of+the+impact+of+cloud+patterns+on+quality+of+servi...
 
Machine Language and Pattern Analysis IEEE 2015 Projects
Machine Language and Pattern Analysis IEEE 2015 ProjectsMachine Language and Pattern Analysis IEEE 2015 Projects
Machine Language and Pattern Analysis IEEE 2015 Projects
 
Land Cover and Land use Classifiction from Satellite Image Time Series Data u...
Land Cover and Land use Classifiction from Satellite Image Time Series Data u...Land Cover and Land use Classifiction from Satellite Image Time Series Data u...
Land Cover and Land use Classifiction from Satellite Image Time Series Data u...
 
Dotnet distributed processing of probabilistic top-k queries in wireless sen...
Dotnet  distributed processing of probabilistic top-k queries in wireless sen...Dotnet  distributed processing of probabilistic top-k queries in wireless sen...
Dotnet distributed processing of probabilistic top-k queries in wireless sen...
 
Novel design algorithm for low complexity programmable fir filters based on e...
Novel design algorithm for low complexity programmable fir filters based on e...Novel design algorithm for low complexity programmable fir filters based on e...
Novel design algorithm for low complexity programmable fir filters based on e...
 
PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)
 
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
 
Popescu.ppt
Popescu.pptPopescu.ppt
Popescu.ppt
 
Lexie.IGARSS11.v3.pptx
Lexie.IGARSS11.v3.pptxLexie.IGARSS11.v3.pptx
Lexie.IGARSS11.v3.pptx
 

Destacado (7)

Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
Clustering
ClusteringClustering
Clustering
 
Analyse de données avec R : Une petite introduction
Analyse de données avec R : Une petite introductionAnalyse de données avec R : Une petite introduction
Analyse de données avec R : Une petite introduction
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Benharratharijtp2 classification
Benharratharijtp2 classificationBenharratharijtp2 classification
Benharratharijtp2 classification
 

Similar a An Empirical Comparison of Fast and Efficient Tools for Mining Textual Data

[ ] uottawa_copeck.doc
[ ] uottawa_copeck.doc[ ] uottawa_copeck.doc
[ ] uottawa_copeck.doc
butest
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative Analysis
Editor IJMTER
 
Text clustering
Text clusteringText clustering
Text clustering
KU Leuven
 
Presentation
PresentationPresentation
Presentation
butest
 

Similar a An Empirical Comparison of Fast and Efficient Tools for Mining Textual Data (20)

clustering.pptx
clustering.pptxclustering.pptx
clustering.pptx
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
Bl24409420
Bl24409420Bl24409420
Bl24409420
 
[ ] uottawa_copeck.doc
[ ] uottawa_copeck.doc[ ] uottawa_copeck.doc
[ ] uottawa_copeck.doc
 
Clustering
ClusteringClustering
Clustering
 
Clustering ppt
Clustering pptClustering ppt
Clustering ppt
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative Analysis
 
Cluster
ClusterCluster
Cluster
 
Text clustering
Text clusteringText clustering
Text clustering
 
Influence of priors over multityped object in evolutionary clustering
Influence of priors over multityped object in evolutionary clusteringInfluence of priors over multityped object in evolutionary clustering
Influence of priors over multityped object in evolutionary clustering
 
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERINGINFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
 
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxLongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
 
Presentation
PresentationPresentation
Presentation
 
Clustering
ClusteringClustering
Clustering
 
Learning to tokenize for Generative Retrival.pptx
Learning to tokenize for Generative Retrival.pptxLearning to tokenize for Generative Retrival.pptx
Learning to tokenize for Generative Retrival.pptx
 
IRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering TechniquesIRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering Techniques
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

An Empirical Comparison of Fast and Efficient Tools for Mining Textual Data