SlideShare una empresa de Scribd logo
1 de 9
Descargar para leer sin conexión
Entity Linking in Social Media
Project Number : 10
Group Number : 51
- Abhishek Mittal, 201101192
- Mohit Aggarwal, 201101164
- Vishrut Mehta, 201102128
- Himanshu Ghadiya, 201305620
Overview
● The main aim of our project is to Link Entities in Social Media,
i.e. extracting context and meaning of a sentence from a tweet
and linking it to a wikipedia page for more understanding.
● In today’s world, Semantic understanding of Sentence is very
important. Now, most of our conversation happens through social
media, and its important to understand the meaning of those
conversations, which is possible by linking named entities to
their context. So we have taken tweets as our base to evaluate
and test our method of entity linking by extracting context from
tweets.
Approach
● We extract named entities from the tweets using the CMU-ARK
tagger .
● The named entities are then mapped to relevant news feeds in a
particular time interval.
● We then extract the named entities from these news feeds and
obtain a final collection of related entities that would contain
sufficient information about that tweet.
● Corresponding to each entity, we find the Wikipedia Pages.
● We then find the labels for each wiki page in order to find their
context and then finally map the tweets to that context. The
classification task is done using SVM.
Design
Datasets
We have used the following datasets for the project -
● A dataset of tweets.
● A dataset of news feeds from different news websites. We have used the
CBS News dataset.
● A 40 GB Wikipedia Dump as the training set for SVM. Right now, we
have trained the SVM on only 5 GB of Wikipedia data.
● A predefined set of about 15 labels, that the tweets would be mapped to.
Tools
We have used the following tools for the project -
● CMR-ARK parser - To find named entities using mention detection from
tweets.
● Stanford parser - To find named entities using mention detection on news
feeds (as they are structured).
● Wikipedia Search API - To find wiki pages for a keyword.
● SVM - Libsvm - To find the context of the wiki page.
Results
● We evaluated our system on a small dataset. We took about 200 tweets
dated 10 January, 2014 and news feeds during all the 24 hours of that
day.
● We then ran our algorithm to find the context of each tweet. After
comparing the results with the labels we had manually assigned, we
found the Accuracy to be around 37 percent.
● The low accuracy in outputs is mostly because of small training and
testing datasets used for classification. When we train the SVM on 40 GB
of Wikipedia dataset, we are confident of achieving a good accuracy.
Challenges and Issues
● Feature Selection for SVM was a major challenge. We would have to
choose such feature vectors that would give maximum accuracy during
classification.
● Training SVM on 40GB of Wikipedia is a major challenge.
● Right now, we have taken only 15 labels for classifying the tweets.
Increasing the number of labels would make the algorithm more
computation intensive. Scaling this system for bigger datasets and more
contexts would require more optimizations.
Thank You

Más contenido relacionado

Destacado

digital marketing plan for the launch of the online bidding company Pujalista...
digital marketing plan for the launch of the online bidding company Pujalista...digital marketing plan for the launch of the online bidding company Pujalista...
digital marketing plan for the launch of the online bidding company Pujalista...Valeria Deserto
 
Digital Marketing Plan for American Birding Association
Digital Marketing Plan for American Birding AssociationDigital Marketing Plan for American Birding Association
Digital Marketing Plan for American Birding AssociationDaniel Forster
 
Glen Eden Digital Marketing Plan
Glen Eden Digital Marketing PlanGlen Eden Digital Marketing Plan
Glen Eden Digital Marketing PlanJoshua Favaro
 
LUMIX Digital Marketing Plan
LUMIX Digital Marketing PlanLUMIX Digital Marketing Plan
LUMIX Digital Marketing PlanMary Raftery
 
Digital Marketing Plan
Digital Marketing PlanDigital Marketing Plan
Digital Marketing PlanAshley Egan
 
Gut Checking Your 2015 Integrated Digital Marketing Plan
Gut Checking Your 2015 Integrated Digital Marketing PlanGut Checking Your 2015 Integrated Digital Marketing Plan
Gut Checking Your 2015 Integrated Digital Marketing PlanMike Corak
 
NSTA digital marketing plan
NSTA digital marketing planNSTA digital marketing plan
NSTA digital marketing plantourismvc
 
Digital marketing Plan for Food Processing Company
Digital marketing Plan for Food Processing CompanyDigital marketing Plan for Food Processing Company
Digital marketing Plan for Food Processing CompanyKavish Arora
 
Plan de Marketing Digital
Plan de Marketing DigitalPlan de Marketing Digital
Plan de Marketing DigitalNicolás Vives
 
Copy of comercio_electrónico_2.ª_parte_plan_de_marketing_digital
Copy of comercio_electrónico_2.ª_parte_plan_de_marketing_digitalCopy of comercio_electrónico_2.ª_parte_plan_de_marketing_digital
Copy of comercio_electrónico_2.ª_parte_plan_de_marketing_digitalMichel Soares de Oliveira
 
Digital Marketing Plan for Profecs education
Digital Marketing Plan for Profecs educationDigital Marketing Plan for Profecs education
Digital Marketing Plan for Profecs educationAnindita Sarkar
 
Direct Wine Digital Marketing
Direct Wine Digital MarketingDirect Wine Digital Marketing
Direct Wine Digital MarketingPeter Harrison
 
IDM Bootcamp - Building an integrated (Digital) marketing plan: an IDM perspe...
IDM Bootcamp - Building an integrated (Digital) marketing plan: an IDM perspe...IDM Bootcamp - Building an integrated (Digital) marketing plan: an IDM perspe...
IDM Bootcamp - Building an integrated (Digital) marketing plan: an IDM perspe...theidm_marketing
 
How to Build an End-to-End Digital Marketing Plan
How to Build an End-to-End Digital Marketing PlanHow to Build an End-to-End Digital Marketing Plan
How to Build an End-to-End Digital Marketing PlanAuthentia
 
Digital Marketing Plan for Education - Vibes Communications
Digital Marketing Plan for Education - Vibes CommunicationsDigital Marketing Plan for Education - Vibes Communications
Digital Marketing Plan for Education - Vibes CommunicationsVibes Communications Pvt Ltd
 
Innervate Digital Marketing Plan
Innervate Digital Marketing PlanInnervate Digital Marketing Plan
Innervate Digital Marketing PlanJordan Mason
 

Destacado (17)

digital marketing plan for the launch of the online bidding company Pujalista...
digital marketing plan for the launch of the online bidding company Pujalista...digital marketing plan for the launch of the online bidding company Pujalista...
digital marketing plan for the launch of the online bidding company Pujalista...
 
Digital Marketing Plan for American Birding Association
Digital Marketing Plan for American Birding AssociationDigital Marketing Plan for American Birding Association
Digital Marketing Plan for American Birding Association
 
Glen Eden Digital Marketing Plan
Glen Eden Digital Marketing PlanGlen Eden Digital Marketing Plan
Glen Eden Digital Marketing Plan
 
LUMIX Digital Marketing Plan
LUMIX Digital Marketing PlanLUMIX Digital Marketing Plan
LUMIX Digital Marketing Plan
 
Digital Marketing Plan
Digital Marketing PlanDigital Marketing Plan
Digital Marketing Plan
 
Gut Checking Your 2015 Integrated Digital Marketing Plan
Gut Checking Your 2015 Integrated Digital Marketing PlanGut Checking Your 2015 Integrated Digital Marketing Plan
Gut Checking Your 2015 Integrated Digital Marketing Plan
 
NSTA digital marketing plan
NSTA digital marketing planNSTA digital marketing plan
NSTA digital marketing plan
 
Digital marketing Plan for Food Processing Company
Digital marketing Plan for Food Processing CompanyDigital marketing Plan for Food Processing Company
Digital marketing Plan for Food Processing Company
 
Plan de Marketing Digital
Plan de Marketing DigitalPlan de Marketing Digital
Plan de Marketing Digital
 
Copy of comercio_electrónico_2.ª_parte_plan_de_marketing_digital
Copy of comercio_electrónico_2.ª_parte_plan_de_marketing_digitalCopy of comercio_electrónico_2.ª_parte_plan_de_marketing_digital
Copy of comercio_electrónico_2.ª_parte_plan_de_marketing_digital
 
Plan de marketing digital
Plan de marketing digitalPlan de marketing digital
Plan de marketing digital
 
Digital Marketing Plan for Profecs education
Digital Marketing Plan for Profecs educationDigital Marketing Plan for Profecs education
Digital Marketing Plan for Profecs education
 
Direct Wine Digital Marketing
Direct Wine Digital MarketingDirect Wine Digital Marketing
Direct Wine Digital Marketing
 
IDM Bootcamp - Building an integrated (Digital) marketing plan: an IDM perspe...
IDM Bootcamp - Building an integrated (Digital) marketing plan: an IDM perspe...IDM Bootcamp - Building an integrated (Digital) marketing plan: an IDM perspe...
IDM Bootcamp - Building an integrated (Digital) marketing plan: an IDM perspe...
 
How to Build an End-to-End Digital Marketing Plan
How to Build an End-to-End Digital Marketing PlanHow to Build an End-to-End Digital Marketing Plan
How to Build an End-to-End Digital Marketing Plan
 
Digital Marketing Plan for Education - Vibes Communications
Digital Marketing Plan for Education - Vibes CommunicationsDigital Marketing Plan for Education - Vibes Communications
Digital Marketing Plan for Education - Vibes Communications
 
Innervate Digital Marketing Plan
Innervate Digital Marketing PlanInnervate Digital Marketing Plan
Innervate Digital Marketing Plan
 

Último

SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationShrmpro
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durbanmasabamasaba
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...masabamasaba
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburgmasabamasaba
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 

Último (20)

SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 

Ire project - Entity Linking in Social Media

  • 1. Entity Linking in Social Media Project Number : 10 Group Number : 51 - Abhishek Mittal, 201101192 - Mohit Aggarwal, 201101164 - Vishrut Mehta, 201102128 - Himanshu Ghadiya, 201305620
  • 2. Overview ● The main aim of our project is to Link Entities in Social Media, i.e. extracting context and meaning of a sentence from a tweet and linking it to a wikipedia page for more understanding. ● In today’s world, Semantic understanding of Sentence is very important. Now, most of our conversation happens through social media, and its important to understand the meaning of those conversations, which is possible by linking named entities to their context. So we have taken tweets as our base to evaluate and test our method of entity linking by extracting context from tweets.
  • 3. Approach ● We extract named entities from the tweets using the CMU-ARK tagger . ● The named entities are then mapped to relevant news feeds in a particular time interval. ● We then extract the named entities from these news feeds and obtain a final collection of related entities that would contain sufficient information about that tweet. ● Corresponding to each entity, we find the Wikipedia Pages. ● We then find the labels for each wiki page in order to find their context and then finally map the tweets to that context. The classification task is done using SVM.
  • 5. Datasets We have used the following datasets for the project - ● A dataset of tweets. ● A dataset of news feeds from different news websites. We have used the CBS News dataset. ● A 40 GB Wikipedia Dump as the training set for SVM. Right now, we have trained the SVM on only 5 GB of Wikipedia data. ● A predefined set of about 15 labels, that the tweets would be mapped to.
  • 6. Tools We have used the following tools for the project - ● CMR-ARK parser - To find named entities using mention detection from tweets. ● Stanford parser - To find named entities using mention detection on news feeds (as they are structured). ● Wikipedia Search API - To find wiki pages for a keyword. ● SVM - Libsvm - To find the context of the wiki page.
  • 7. Results ● We evaluated our system on a small dataset. We took about 200 tweets dated 10 January, 2014 and news feeds during all the 24 hours of that day. ● We then ran our algorithm to find the context of each tweet. After comparing the results with the labels we had manually assigned, we found the Accuracy to be around 37 percent. ● The low accuracy in outputs is mostly because of small training and testing datasets used for classification. When we train the SVM on 40 GB of Wikipedia dataset, we are confident of achieving a good accuracy.
  • 8. Challenges and Issues ● Feature Selection for SVM was a major challenge. We would have to choose such feature vectors that would give maximum accuracy during classification. ● Training SVM on 40GB of Wikipedia is a major challenge. ● Right now, we have taken only 15 labels for classifying the tweets. Increasing the number of labels would make the algorithm more computation intensive. Scaling this system for bigger datasets and more contexts would require more optimizations.