SlideShare a Scribd company logo
1 of 1
Download to read offline
Unsupervised semantic clustering of Twitter hashtags 
Introduction 
Carlos Vicient, Antonio Moreno 
Micro-blogging services such as Twitter constitute one of the most 
successful kinds of applications in the current Social Web. 
Every day more than 500 million tweets are sent. 
Growing interest in the design and development of tools that allow 
users to analyse large unstructured repositories of user-tagged data. 
Problems of data visualisation. 
Semantic information retrieval & information extraction. 
Hashtag recommendation,etc. 
Our research is focused in the automated clustering of the hashtags 
present in a set of tweets, which may lead to a straightforward discovery 
of its main topics. 
Methodology 
Universitat Rovira i Virgili 
ITAKA research group - Intelligent Technologies for Advanced Knowledge Acquisition 
Dept. of Computer Science and Mathematics 
Avda. Paisos Catalans, 26. 43007 Tarragona, Spain. 
{carlos.vicient, antonio.moreno}@urv.cat 
Goal: Group hashtags in clusters and select the most relevant ones. 
A bottom-up analysis from maxK clusters to minK clusters is performed. 
Filtering thresholds: 
T1: establish the minimum inter-cluster homogeneity. 
T2: establish the minimum number of elements per cluster. 
Fig.2. Filtering process (t1=0.6, t2=3) 
-- 
536 relevant medical hashtags 
classified in 16 manually labelled 
classes. 
2530 
793 769 
371 
Table 1. Results of Oncology tweet set 
1) 
- 
293 
This unsupervised domain-independent methodology allows a 
Future work 
organization 
[ , Companies, 
Hospital, Award, Care, 
] 
Winner, 0.4 0.6 0.9 0.8 
Using this matrix, the similarity between concepts is 
calculated using the Wu-Palmer [2] distance. 
21st European Conference on Artificial Intelligence. ECAI 2014. Prague. August 18th-22nd. 
[1] Fellbaum, C. (Ed.). (1998). WordNet: An Electronic Lexical Database (Language, 
Speech, and Communication). MIT Press. 
Hashtags are unstructured and unlimited: problems like synonymy, 
polysemy, lexically similar hashtags, acronyms,a combination of 
several words, or just invented expressions. 
Current hashtag clustering methods are mostly based on a syntactic 
analysis of their co-ocurrence. 
Goal: The aim of this stage is to fill the gap between hashtags and 
concepts in order to be able to compare two different hashtags at 
the semantic level. 
semantic clustering of a set of hashtags and the identification of 
its most relevant topics, filtering the large proportion of noise 
inherent to these sets. 
-- 
2) 
-- 
Oncology dataset: 
5000 tweets (www.symplur.com) 
1086 hashtags (930 annotated) 
Automatic results (minK=5, MaxK=200, t1=0.70 and t2=10): 
13 of the 16 target manual classes were identified 
15 relevant clusters (with a total number of 266 hashtags) 
Fig.1. Semantic annotation process 
1) 
2) 
3) 
Analysis of the full content of the tweets. 
Improve the treatment of polysemic hashtags. 
Change the bottom-up analysis of the hierarchical tree by a top-down 
procedure, so that general clusters are considered before the specific ones. 
Semantic Annotation 
WordNet [1] 
Entity 
illness 
tumor 
cancer 
sign of 
zodiac 
cancer 
(crab) 
medical 
institution 
clinic 
Clinic 
Organization 
Semantic Similarity 
Goal: Establish a mechanism to compare two different hashtags. 
Build a similarity matrix that contains all the hashtags. 
Clustering & filtering 
Evaluation 
Id Centroid Size Prec. Rec. Manual class 
1 Woody_Plant 11 64% 41% Substances 
2 Day 10 50% 24% Temporal 
3 Therapy 20 75% 37% Medical tests 
4 Medicine 17 76% 23% Medications 
5 Cancer 46 80% 62% Cancer 
6 Court 14 43% 43% Hospitals 
7 Biotechnology 10 60% 15% Biological 
8 Health 23 43% 45% Health Care 
9 Medicine 43 60% 63% Medical Fields 
10 System 10 70% 21% Body Parts 
11 Area 11 73% 18% Geographical 
Locations 
12 Teaching 10 40% 14% Academic, 
Research 
13 Person 16 38% 12% Medical Jobs 
14 Center 10 40% 12% Body Parts 
15 Doctor 15 60% 18% Medical Jobs 
16 – 31 - - - - Noise 
129 
52 30 2 3 24 1 
1 2 3 4 5 6 7 8 9 10 11 12 
# 
t 
w 
e 
e 
t 
s 
#hashtags 
[2] Wu, Z. and Palmer, M.: Verb semantics and lexical selection. In Proceedings of the 
32nd annual Meeting of the Association for Computational Linguistics, New 
Mexico, USA. 1994. 133-138 
Hashtag1: “#Mayo” { clinic, organization } 
Hashtag2: “#AustinCancerCenter” { center, medical institution } 
*c = LeastCommonSubsumer(a,b)

More Related Content

What's hot

ONTOLOGY BASED TELE-HEALTH SMART HOME CARE SYSTEM: ONTOSMART TO MONITOR ELDERLY
ONTOLOGY BASED TELE-HEALTH SMART HOME CARE SYSTEM: ONTOSMART TO MONITOR ELDERLY ONTOLOGY BASED TELE-HEALTH SMART HOME CARE SYSTEM: ONTOSMART TO MONITOR ELDERLY
ONTOLOGY BASED TELE-HEALTH SMART HOME CARE SYSTEM: ONTOSMART TO MONITOR ELDERLY csandit
 
Microarray DB Integration
Microarray DB IntegrationMicroarray DB Integration
Microarray DB IntegrationAbhishek Dabral
 
Lec1-Into
Lec1-IntoLec1-Into
Lec1-Intobutest
 
what is a report?
what is a report?what is a report?
what is a report?diegofvl1
 
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL  FOR HEALTHCARE INFORMATION SYSTEM :   ...ONTOLOGY-DRIVEN INFORMATION RETRIEVAL  FOR HEALTHCARE INFORMATION SYSTEM :   ...
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...IJNSA Journal
 
Application of blockchain technology in healthcare and biomedicine
Application of blockchain technology in healthcare and biomedicineApplication of blockchain technology in healthcare and biomedicine
Application of blockchain technology in healthcare and biomedicinePranavathiyani G
 
AUTHORIZATION FRAMEWORK FOR MEDICAL DATA
AUTHORIZATION FRAMEWORK FOR MEDICAL DATA AUTHORIZATION FRAMEWORK FOR MEDICAL DATA
AUTHORIZATION FRAMEWORK FOR MEDICAL DATA ijmnct
 
Trending Topics in Machine Learning
Trending Topics in Machine LearningTrending Topics in Machine Learning
Trending Topics in Machine LearningTechsparks
 
Focus on the Evidence: a knowledge graph approach to profiling drug targets
Focus on the Evidence: a knowledge graph approach to profiling drug targetsFocus on the Evidence: a knowledge graph approach to profiling drug targets
Focus on the Evidence: a knowledge graph approach to profiling drug targetsNolan Nichols
 
Ijarcet vol-2-issue-4-1393-1397
Ijarcet vol-2-issue-4-1393-1397Ijarcet vol-2-issue-4-1393-1397
Ijarcet vol-2-issue-4-1393-1397Editor IJARCET
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
Provider Aware Anonymization Algorithm for Preserving M - Privacy
Provider Aware Anonymization Algorithm for Preserving M - PrivacyProvider Aware Anonymization Algorithm for Preserving M - Privacy
Provider Aware Anonymization Algorithm for Preserving M - PrivacyIJERA Editor
 
Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...IOSR Journals
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Keesthehyve
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...ijseajournal
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...ijseajournal
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
 

What's hot (17)

ONTOLOGY BASED TELE-HEALTH SMART HOME CARE SYSTEM: ONTOSMART TO MONITOR ELDERLY
ONTOLOGY BASED TELE-HEALTH SMART HOME CARE SYSTEM: ONTOSMART TO MONITOR ELDERLY ONTOLOGY BASED TELE-HEALTH SMART HOME CARE SYSTEM: ONTOSMART TO MONITOR ELDERLY
ONTOLOGY BASED TELE-HEALTH SMART HOME CARE SYSTEM: ONTOSMART TO MONITOR ELDERLY
 
Microarray DB Integration
Microarray DB IntegrationMicroarray DB Integration
Microarray DB Integration
 
Lec1-Into
Lec1-IntoLec1-Into
Lec1-Into
 
what is a report?
what is a report?what is a report?
what is a report?
 
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL  FOR HEALTHCARE INFORMATION SYSTEM :   ...ONTOLOGY-DRIVEN INFORMATION RETRIEVAL  FOR HEALTHCARE INFORMATION SYSTEM :   ...
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...
 
Application of blockchain technology in healthcare and biomedicine
Application of blockchain technology in healthcare and biomedicineApplication of blockchain technology in healthcare and biomedicine
Application of blockchain technology in healthcare and biomedicine
 
AUTHORIZATION FRAMEWORK FOR MEDICAL DATA
AUTHORIZATION FRAMEWORK FOR MEDICAL DATA AUTHORIZATION FRAMEWORK FOR MEDICAL DATA
AUTHORIZATION FRAMEWORK FOR MEDICAL DATA
 
Trending Topics in Machine Learning
Trending Topics in Machine LearningTrending Topics in Machine Learning
Trending Topics in Machine Learning
 
Focus on the Evidence: a knowledge graph approach to profiling drug targets
Focus on the Evidence: a knowledge graph approach to profiling drug targetsFocus on the Evidence: a knowledge graph approach to profiling drug targets
Focus on the Evidence: a knowledge graph approach to profiling drug targets
 
Ijarcet vol-2-issue-4-1393-1397
Ijarcet vol-2-issue-4-1393-1397Ijarcet vol-2-issue-4-1393-1397
Ijarcet vol-2-issue-4-1393-1397
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
Provider Aware Anonymization Algorithm for Preserving M - Privacy
Provider Aware Anonymization Algorithm for Preserving M - PrivacyProvider Aware Anonymization Algorithm for Preserving M - Privacy
Provider Aware Anonymization Algorithm for Preserving M - Privacy
 
Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Kees
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 

Similar to ECAI 2014 poster - Unsupervised semantic clustering of Twitter hashtags

CIS 25 SPRING 2020FINAL Due 1159 PM May 22 (this is a har.docx
CIS 25 SPRING 2020FINAL Due 1159 PM May 22 (this is a har.docxCIS 25 SPRING 2020FINAL Due 1159 PM May 22 (this is a har.docx
CIS 25 SPRING 2020FINAL Due 1159 PM May 22 (this is a har.docxsleeperharwell
 
A_Comparison_of_Manual_and_Computational_Thematic_Analyses.pdf
A_Comparison_of_Manual_and_Computational_Thematic_Analyses.pdfA_Comparison_of_Manual_and_Computational_Thematic_Analyses.pdf
A_Comparison_of_Manual_and_Computational_Thematic_Analyses.pdfLandingJatta1
 
Achieving Semantic Integration of Medical Knowledge for Clinical Decision Sup...
Achieving Semantic Integration of Medical Knowledge for Clinical Decision Sup...Achieving Semantic Integration of Medical Knowledge for Clinical Decision Sup...
Achieving Semantic Integration of Medical Knowledge for Clinical Decision Sup...AmrAlaaEldin12
 
Social Media and Text Analytics
Social Media and Text AnalyticsSocial Media and Text Analytics
Social Media and Text AnalyticsRushikeshChikane2
 
A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...ijcsit
 
Identifying Structures in Social Conversations in NSCLC Patients through the ...
Identifying Structures in Social Conversations in NSCLC Patients through the ...Identifying Structures in Social Conversations in NSCLC Patients through the ...
Identifying Structures in Social Conversations in NSCLC Patients through the ...IJERA Editor
 
Identifying Structures in Social Conversations in NSCLC Patients through the ...
Identifying Structures in Social Conversations in NSCLC Patients through the ...Identifying Structures in Social Conversations in NSCLC Patients through the ...
Identifying Structures in Social Conversations in NSCLC Patients through the ...IJERA Editor
 
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...Ralf Klamma
 
The Q-Codes: Metadata, Research data, and Desiderata_2018 12 04_gl20_Author_R...
The Q-Codes: Metadata, Research data, and Desiderata_2018 12 04_gl20_Author_R...The Q-Codes: Metadata, Research data, and Desiderata_2018 12 04_gl20_Author_R...
The Q-Codes: Metadata, Research data, and Desiderata_2018 12 04_gl20_Author_R...Miguel Pizzanelli
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAGopen_phacts
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancerpaperpublications3
 
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSTHE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSManishReddy706923
 
THIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 PosterTHIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 PosterDiana Zajac
 
Combating propaganda texts using transfer learning
Combating propaganda texts using transfer learningCombating propaganda texts using transfer learning
Combating propaganda texts using transfer learningIAESIJAI
 
A Review of Intelligent Agent Systems in Animal Health Care
A Review of Intelligent Agent Systems in Animal Health CareA Review of Intelligent Agent Systems in Animal Health Care
A Review of Intelligent Agent Systems in Animal Health CareIJCSIS Research Publications
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hivSalford Systems
 

Similar to ECAI 2014 poster - Unsupervised semantic clustering of Twitter hashtags (20)

CIS 25 SPRING 2020FINAL Due 1159 PM May 22 (this is a har.docx
CIS 25 SPRING 2020FINAL Due 1159 PM May 22 (this is a har.docxCIS 25 SPRING 2020FINAL Due 1159 PM May 22 (this is a har.docx
CIS 25 SPRING 2020FINAL Due 1159 PM May 22 (this is a har.docx
 
Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...
 
Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...
Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...
Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant ...
 
A_Comparison_of_Manual_and_Computational_Thematic_Analyses.pdf
A_Comparison_of_Manual_and_Computational_Thematic_Analyses.pdfA_Comparison_of_Manual_and_Computational_Thematic_Analyses.pdf
A_Comparison_of_Manual_and_Computational_Thematic_Analyses.pdf
 
Achieving Semantic Integration of Medical Knowledge for Clinical Decision Sup...
Achieving Semantic Integration of Medical Knowledge for Clinical Decision Sup...Achieving Semantic Integration of Medical Knowledge for Clinical Decision Sup...
Achieving Semantic Integration of Medical Knowledge for Clinical Decision Sup...
 
Social Media and Text Analytics
Social Media and Text AnalyticsSocial Media and Text Analytics
Social Media and Text Analytics
 
A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...
 
757
757757
757
 
Identifying Structures in Social Conversations in NSCLC Patients through the ...
Identifying Structures in Social Conversations in NSCLC Patients through the ...Identifying Structures in Social Conversations in NSCLC Patients through the ...
Identifying Structures in Social Conversations in NSCLC Patients through the ...
 
Identifying Structures in Social Conversations in NSCLC Patients through the ...
Identifying Structures in Social Conversations in NSCLC Patients through the ...Identifying Structures in Social Conversations in NSCLC Patients through the ...
Identifying Structures in Social Conversations in NSCLC Patients through the ...
 
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...
 
The Q-Codes: Metadata, Research data, and Desiderata_2018 12 04_gl20_Author_R...
The Q-Codes: Metadata, Research data, and Desiderata_2018 12 04_gl20_Author_R...The Q-Codes: Metadata, Research data, and Desiderata_2018 12 04_gl20_Author_R...
The Q-Codes: Metadata, Research data, and Desiderata_2018 12 04_gl20_Author_R...
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancer
 
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSTHE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
 
THIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 PosterTHIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 Poster
 
Combating propaganda texts using transfer learning
Combating propaganda texts using transfer learningCombating propaganda texts using transfer learning
Combating propaganda texts using transfer learning
 
A Review of Intelligent Agent Systems in Animal Health Care
A Review of Intelligent Agent Systems in Animal Health CareA Review of Intelligent Agent Systems in Animal Health Care
A Review of Intelligent Agent Systems in Animal Health Care
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hiv
 

More from Antonio Moreno

Dynamic learning of keyword-based preferences for news recommendation (WI-2014)
Dynamic learning of keyword-based preferences for news recommendation (WI-2014)Dynamic learning of keyword-based preferences for news recommendation (WI-2014)
Dynamic learning of keyword-based preferences for news recommendation (WI-2014)Antonio Moreno
 
Artificial Intelligence Master URV-UPC-UB
Artificial Intelligence Master URV-UPC-UBArtificial Intelligence Master URV-UPC-UB
Artificial Intelligence Master URV-UPC-UBAntonio Moreno
 
URV Master on Computer Engineering: Computer Security and Inteligent Systems
URV Master on Computer Engineering: Computer Security and Inteligent SystemsURV Master on Computer Engineering: Computer Security and Inteligent Systems
URV Master on Computer Engineering: Computer Security and Inteligent SystemsAntonio Moreno
 
EnoSigTur: recomanació personalitzada d'activitats enoturístiques
EnoSigTur: recomanació personalitzada d'activitats enoturístiquesEnoSigTur: recomanació personalitzada d'activitats enoturístiques
EnoSigTur: recomanació personalitzada d'activitats enoturístiquesAntonio Moreno
 
Artificial Intelligence techniques in Tourism at URV
Artificial Intelligence techniques in Tourism at URVArtificial Intelligence techniques in Tourism at URV
Artificial Intelligence techniques in Tourism at URVAntonio Moreno
 
MAS course Lect13 industrial applications
MAS course Lect13 industrial applicationsMAS course Lect13 industrial applications
MAS course Lect13 industrial applicationsAntonio Moreno
 
MAS course - Lect12 - URV health care applications
MAS course - Lect12 - URV health care applicationsMAS course - Lect12 - URV health care applications
MAS course - Lect12 - URV health care applicationsAntonio Moreno
 
MAS course - Lect11 - URV applications
MAS course - Lect11 - URV applicationsMAS course - Lect11 - URV applications
MAS course - Lect11 - URV applicationsAntonio Moreno
 
MAS Course - Lect10 - coordination
MAS Course - Lect10 - coordinationMAS Course - Lect10 - coordination
MAS Course - Lect10 - coordinationAntonio Moreno
 
Lect6-An introduction to ontologies and ontology development
Lect6-An introduction to ontologies and ontology developmentLect6-An introduction to ontologies and ontology development
Lect6-An introduction to ontologies and ontology developmentAntonio Moreno
 
Lecture 5 - Agent communication
Lecture 5 - Agent communicationLecture 5 - Agent communication
Lecture 5 - Agent communicationAntonio Moreno
 
Lecture 4- Agent types
Lecture 4- Agent typesLecture 4- Agent types
Lecture 4- Agent typesAntonio Moreno
 
Introduction to agents and multi-agent systems
Introduction to agents and multi-agent systemsIntroduction to agents and multi-agent systems
Introduction to agents and multi-agent systemsAntonio Moreno
 

More from Antonio Moreno (18)

Dynamic learning of keyword-based preferences for news recommendation (WI-2014)
Dynamic learning of keyword-based preferences for news recommendation (WI-2014)Dynamic learning of keyword-based preferences for news recommendation (WI-2014)
Dynamic learning of keyword-based preferences for news recommendation (WI-2014)
 
Artificial Intelligence Master URV-UPC-UB
Artificial Intelligence Master URV-UPC-UBArtificial Intelligence Master URV-UPC-UB
Artificial Intelligence Master URV-UPC-UB
 
URV Master on Computer Engineering: Computer Security and Inteligent Systems
URV Master on Computer Engineering: Computer Security and Inteligent SystemsURV Master on Computer Engineering: Computer Security and Inteligent Systems
URV Master on Computer Engineering: Computer Security and Inteligent Systems
 
EnoSigTur: recomanació personalitzada d'activitats enoturístiques
EnoSigTur: recomanació personalitzada d'activitats enoturístiquesEnoSigTur: recomanació personalitzada d'activitats enoturístiques
EnoSigTur: recomanació personalitzada d'activitats enoturístiques
 
Artificial Intelligence techniques in Tourism at URV
Artificial Intelligence techniques in Tourism at URVArtificial Intelligence techniques in Tourism at URV
Artificial Intelligence techniques in Tourism at URV
 
MAS course Lect13 industrial applications
MAS course Lect13 industrial applicationsMAS course Lect13 industrial applications
MAS course Lect13 industrial applications
 
MAS course - Lect12 - URV health care applications
MAS course - Lect12 - URV health care applicationsMAS course - Lect12 - URV health care applications
MAS course - Lect12 - URV health care applications
 
MAS course - Lect11 - URV applications
MAS course - Lect11 - URV applicationsMAS course - Lect11 - URV applications
MAS course - Lect11 - URV applications
 
MAS Course - Lect10 - coordination
MAS Course - Lect10 - coordinationMAS Course - Lect10 - coordination
MAS Course - Lect10 - coordination
 
MAS course - Lect 9
MAS course - Lect 9 MAS course - Lect 9
MAS course - Lect 9
 
Lect 8-auctions
Lect 8-auctionsLect 8-auctions
Lect 8-auctions
 
Lect7MAS-Coordination
Lect7MAS-CoordinationLect7MAS-Coordination
Lect7MAS-Coordination
 
Lect6-An introduction to ontologies and ontology development
Lect6-An introduction to ontologies and ontology developmentLect6-An introduction to ontologies and ontology development
Lect6-An introduction to ontologies and ontology development
 
Lecture 5 - Agent communication
Lecture 5 - Agent communicationLecture 5 - Agent communication
Lecture 5 - Agent communication
 
Lecture 4- Agent types
Lecture 4- Agent typesLecture 4- Agent types
Lecture 4- Agent types
 
Agent properties
Agent propertiesAgent properties
Agent properties
 
Agent architectures
Agent architecturesAgent architectures
Agent architectures
 
Introduction to agents and multi-agent systems
Introduction to agents and multi-agent systemsIntroduction to agents and multi-agent systems
Introduction to agents and multi-agent systems
 

Recently uploaded

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 

Recently uploaded (20)

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 

ECAI 2014 poster - Unsupervised semantic clustering of Twitter hashtags

  • 1. Unsupervised semantic clustering of Twitter hashtags Introduction Carlos Vicient, Antonio Moreno Micro-blogging services such as Twitter constitute one of the most successful kinds of applications in the current Social Web. Every day more than 500 million tweets are sent. Growing interest in the design and development of tools that allow users to analyse large unstructured repositories of user-tagged data. Problems of data visualisation. Semantic information retrieval & information extraction. Hashtag recommendation,etc. Our research is focused in the automated clustering of the hashtags present in a set of tweets, which may lead to a straightforward discovery of its main topics. Methodology Universitat Rovira i Virgili ITAKA research group - Intelligent Technologies for Advanced Knowledge Acquisition Dept. of Computer Science and Mathematics Avda. Paisos Catalans, 26. 43007 Tarragona, Spain. {carlos.vicient, antonio.moreno}@urv.cat Goal: Group hashtags in clusters and select the most relevant ones. A bottom-up analysis from maxK clusters to minK clusters is performed. Filtering thresholds: T1: establish the minimum inter-cluster homogeneity. T2: establish the minimum number of elements per cluster. Fig.2. Filtering process (t1=0.6, t2=3) -- 536 relevant medical hashtags classified in 16 manually labelled classes. 2530 793 769 371 Table 1. Results of Oncology tweet set 1) - 293 This unsupervised domain-independent methodology allows a Future work organization [ , Companies, Hospital, Award, Care, ] Winner, 0.4 0.6 0.9 0.8 Using this matrix, the similarity between concepts is calculated using the Wu-Palmer [2] distance. 21st European Conference on Artificial Intelligence. ECAI 2014. Prague. August 18th-22nd. [1] Fellbaum, C. (Ed.). (1998). WordNet: An Electronic Lexical Database (Language, Speech, and Communication). MIT Press. Hashtags are unstructured and unlimited: problems like synonymy, polysemy, lexically similar hashtags, acronyms,a combination of several words, or just invented expressions. Current hashtag clustering methods are mostly based on a syntactic analysis of their co-ocurrence. Goal: The aim of this stage is to fill the gap between hashtags and concepts in order to be able to compare two different hashtags at the semantic level. semantic clustering of a set of hashtags and the identification of its most relevant topics, filtering the large proportion of noise inherent to these sets. -- 2) -- Oncology dataset: 5000 tweets (www.symplur.com) 1086 hashtags (930 annotated) Automatic results (minK=5, MaxK=200, t1=0.70 and t2=10): 13 of the 16 target manual classes were identified 15 relevant clusters (with a total number of 266 hashtags) Fig.1. Semantic annotation process 1) 2) 3) Analysis of the full content of the tweets. Improve the treatment of polysemic hashtags. Change the bottom-up analysis of the hierarchical tree by a top-down procedure, so that general clusters are considered before the specific ones. Semantic Annotation WordNet [1] Entity illness tumor cancer sign of zodiac cancer (crab) medical institution clinic Clinic Organization Semantic Similarity Goal: Establish a mechanism to compare two different hashtags. Build a similarity matrix that contains all the hashtags. Clustering & filtering Evaluation Id Centroid Size Prec. Rec. Manual class 1 Woody_Plant 11 64% 41% Substances 2 Day 10 50% 24% Temporal 3 Therapy 20 75% 37% Medical tests 4 Medicine 17 76% 23% Medications 5 Cancer 46 80% 62% Cancer 6 Court 14 43% 43% Hospitals 7 Biotechnology 10 60% 15% Biological 8 Health 23 43% 45% Health Care 9 Medicine 43 60% 63% Medical Fields 10 System 10 70% 21% Body Parts 11 Area 11 73% 18% Geographical Locations 12 Teaching 10 40% 14% Academic, Research 13 Person 16 38% 12% Medical Jobs 14 Center 10 40% 12% Body Parts 15 Doctor 15 60% 18% Medical Jobs 16 – 31 - - - - Noise 129 52 30 2 3 24 1 1 2 3 4 5 6 7 8 9 10 11 12 # t w e e t s #hashtags [2] Wu, Z. and Palmer, M.: Verb semantics and lexical selection. In Proceedings of the 32nd annual Meeting of the Association for Computational Linguistics, New Mexico, USA. 1994. 133-138 Hashtag1: “#Mayo” { clinic, organization } Hashtag2: “#AustinCancerCenter” { center, medical institution } *c = LeastCommonSubsumer(a,b)