SlideShare a Scribd company logo
1 of 13
Topic Modeling in Social
Media
Faculty Mentor: Dr. Vasudeva Varma
Student Mentor: Sandeep Panem
Kashyap Murthy Krishnakant Vishwakarma Sajal Sharma Vinil Narang
Topic Modeling
 A Topic Model is a statistical model for
discovering the abstract "topics" that
occur in a collection of documents.
LDA(Latent Dirichlet Allocation)
 Topic Modeling represents documents as mixtures of topics that spit out words with
certain probabilities.
 LDA is the most common topic model currently in use.
 LDA is used in social platforms to come up with a model which can predict on which
topics related document a user prefers to read and write.
LDA …
 LDA is a method which considers documents
beyond mere just bag of words.
 It describes a generative process whereby,
given a Dirichlet conditioned bag filled with
topic-distributions for each document, we
draw a topic mixture from this bag. Then, we
repeatedly draw both a topic and then a
word from that topic to generate the words
in that document.
 we have a generative model that represents
the process by which a corpora of
documents were created in terms of Topics.
Approaches for topic modeling
 Approach1: Finding LDA model for each user in the network.
 Approach2: Finding top K influential users and applying the LDA model on these users only.
 Approach3: Finding communities present in the network and approximating users topics
by applying LDA model of its corresponding community.
Comparing different approaches.
 Approach1:
 Calculating LDA for each user is very costly since the data set is very large.
 So this approach is not feasible for practical purposes.
Comparing different approaches.
 Approach 2:
 Since LDA can’t be applied to each user, therefore we have selected top k users on which we
have applied LDA.
 The top k users were found out using Page Rank algorithm.
 Then we extracted the tweets for these k users and applied LDA on them.
 For this project we followed approach 2.
Approach 2…
 1. Step 1:
 Input : A file containing the follower-followee
relationship in the form of edge list. It contains
~6.25 crore nodes(users) and ~146 crore
edges(follower-followee relationship)
 Output : A file containing the userid and
pageranks of all the nodes(users) in the graph.
 Approach : Used GraphChi’s pagerank
algorithm to perform the above mentioned
task.
 
 2. Step 2:
 Input : A file containing the pageranks and
userids of all the nodes(users) in the graph.
 Output : A file containing the pageranks and
userids of all the users sorted on the basis of
pagerank.
 Approach : External merge sort is used to
perform this kind of sorting because the size of
the file to be sorted was 30GB. 
Approach 2…
 3. Step 3:
 Input : A file containing the pageranks and userids of all the nodes sorted by pagerank.
 Output : A file containing the top 50 users(userids and pagerank)
 4. Step 4:
 Input : A file containing the top 50 users.
 Output : The tweets of these top 50 users.
 Approach : We used the tweet crawler to crawl the tweets of these users.
 5. Step 5:
 Input : 50 files containing tweets of top 50 users.
 Output : LDA models of these 50 users.
 Approach : Used Mallet to perform this task.
Approach 2…
Problems:
 We encountered a case where all the influential elements could belong to a single
community.
 Then LDA on the top k influential elements would not result in a generic model, hence this
technique is not robust and effective for this scenarios.
 Hence we had to move on to the next approach of community detection before applying
the LDA.
Approach 3… and future works :
 Since Approach 2 was not giving expected results, so we tried Community Detection
approach.
 The GraphChi Community Detection code is still in early phase of development and is not
giving expected results.
 There is another API named Infomap which tends to work well for small graphs but the code
burst for large graphs.
 We are still working on approach 3 and aiming for better accuracy and results.
Thank you 

More Related Content

What's hot

Tweet sentiment analysis (Data mining)
Tweet sentiment analysis (Data mining)Tweet sentiment analysis (Data mining)
Tweet sentiment analysis (Data mining)Anil Shrestha
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics DomainDrjabez
 
Ijmer 46067276
Ijmer 46067276Ijmer 46067276
Ijmer 46067276IJMER
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmKhushboo Gupta
 
Classification of Health Forum Messages using Deep Learning
Classification of Health Forum Messages using Deep LearningClassification of Health Forum Messages using Deep Learning
Classification of Health Forum Messages using Deep LearningSejal Naidu
 
Sentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataSentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataIRJET Journal
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using mlPravin Katiyar
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..butest
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONijscai
 
Sentiment analysis of tweets using Neural Networks
Sentiment analysis of tweets using Neural NetworksSentiment analysis of tweets using Neural Networks
Sentiment analysis of tweets using Neural NetworksAdrián Palacios Corella
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
IRJET- Personality Recognition using Multi-Label Classification
IRJET- Personality Recognition using Multi-Label ClassificationIRJET- Personality Recognition using Multi-Label Classification
IRJET- Personality Recognition using Multi-Label ClassificationIRJET Journal
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATAParvathy Devaraj
 
IRJET- Suspicious Email Detection System
IRJET- Suspicious Email Detection SystemIRJET- Suspicious Email Detection System
IRJET- Suspicious Email Detection SystemIRJET Journal
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET Journal
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Paper id 71201913
Paper id 71201913Paper id 71201913
Paper id 71201913IJRAT
 

What's hot (20)

Tweet sentiment analysis (Data mining)
Tweet sentiment analysis (Data mining)Tweet sentiment analysis (Data mining)
Tweet sentiment analysis (Data mining)
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
Ijmer 46067276
Ijmer 46067276Ijmer 46067276
Ijmer 46067276
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
 
Classification of Health Forum Messages using Deep Learning
Classification of Health Forum Messages using Deep LearningClassification of Health Forum Messages using Deep Learning
Classification of Health Forum Messages using Deep Learning
 
Sentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataSentiment Analysis on Twitter Data
Sentiment Analysis on Twitter Data
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Project report
Project reportProject report
Project report
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
 
Sentiment analysis of tweets using Neural Networks
Sentiment analysis of tweets using Neural NetworksSentiment analysis of tweets using Neural Networks
Sentiment analysis of tweets using Neural Networks
 
Q01741118123
Q01741118123Q01741118123
Q01741118123
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Aj35198205
Aj35198205Aj35198205
Aj35198205
 
IRJET- Personality Recognition using Multi-Label Classification
IRJET- Personality Recognition using Multi-Label ClassificationIRJET- Personality Recognition using Multi-Label Classification
IRJET- Personality Recognition using Multi-Label Classification
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
IRJET- Suspicious Email Detection System
IRJET- Suspicious Email Detection SystemIRJET- Suspicious Email Detection System
IRJET- Suspicious Email Detection System
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and Challenges
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Paper id 71201913
Paper id 71201913Paper id 71201913
Paper id 71201913
 

Similar to Topic modeling

IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.ASHISH JAGTAP
 
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...Pieter Heyvaert
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documentssubash chandra
 
onlinelibrarymanagementsystem-160511065906.pdf
onlinelibrarymanagementsystem-160511065906.pdfonlinelibrarymanagementsystem-160511065906.pdf
onlinelibrarymanagementsystem-160511065906.pdfrohanthombre2
 
Online library management system
Online library management systemOnline library management system
Online library management systemBharat Kunwar
 
The “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedInThe “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedInKun Le
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInSam Shah
 
Doctoral seminar (DBIS RWTH Aachen)
Doctoral seminar  (DBIS RWTH Aachen)Doctoral seminar  (DBIS RWTH Aachen)
Doctoral seminar (DBIS RWTH Aachen)Zina Petrushyna
 
DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)Mark Underwood
 
srd117.final.512Spring2016
srd117.final.512Spring2016srd117.final.512Spring2016
srd117.final.512Spring2016Saurabh Deochake
 
EffectiveCrowdSourcingForProductFeatureIdeation v18
EffectiveCrowdSourcingForProductFeatureIdeation v18EffectiveCrowdSourcingForProductFeatureIdeation v18
EffectiveCrowdSourcingForProductFeatureIdeation v18Karthikeyan Rajasekharan
 
Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)ShankarPrasaadRajama
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...IEEEMEMTECHSTUDENTPROJECTS
 
Selection of Tags for Tag Clouds
Selection of Tags for Tag CloudsSelection of Tags for Tag Clouds
Selection of Tags for Tag CloudsAakash Gupta
 
Authorcontext:ire
Authorcontext:ireAuthorcontext:ire
Authorcontext:ireSoham Saha
 

Similar to Topic modeling (20)

SocialLda
SocialLda SocialLda
SocialLda
 
IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.
 
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and ...
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documents
 
onlinelibrarymanagementsystem-160511065906.pdf
onlinelibrarymanagementsystem-160511065906.pdfonlinelibrarymanagementsystem-160511065906.pdf
onlinelibrarymanagementsystem-160511065906.pdf
 
Online library management system
Online library management systemOnline library management system
Online library management system
 
The “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedInThe “Big Data” Ecosystem at LinkedIn
The “Big Data” Ecosystem at LinkedIn
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedIn
 
Doctoral seminar (DBIS RWTH Aachen)
Doctoral seminar  (DBIS RWTH Aachen)Doctoral seminar  (DBIS RWTH Aachen)
Doctoral seminar (DBIS RWTH Aachen)
 
KDD Cup Research Paper
KDD Cup Research PaperKDD Cup Research Paper
KDD Cup Research Paper
 
DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)DevOps Support for an Ethical Software Development Life Cycle (SDLC)
DevOps Support for an Ethical Software Development Life Cycle (SDLC)
 
srd117.final.512Spring2016
srd117.final.512Spring2016srd117.final.512Spring2016
srd117.final.512Spring2016
 
EffectiveCrowdSourcingForProductFeatureIdeation v18
EffectiveCrowdSourcingForProductFeatureIdeation v18EffectiveCrowdSourcingForProductFeatureIdeation v18
EffectiveCrowdSourcingForProductFeatureIdeation v18
 
Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)
 
50120140505004
5012014050500450120140505004
50120140505004
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
 
STACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSISSTACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSIS
 
Selection of Tags for Tag Clouds
Selection of Tags for Tag CloudsSelection of Tags for Tag Clouds
Selection of Tags for Tag Clouds
 
Authorcontext:ire
Authorcontext:ireAuthorcontext:ire
Authorcontext:ire
 
Computers
ComputersComputers
Computers
 

Recently uploaded

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 

Recently uploaded (20)

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 

Topic modeling

  • 1. Topic Modeling in Social Media Faculty Mentor: Dr. Vasudeva Varma Student Mentor: Sandeep Panem Kashyap Murthy Krishnakant Vishwakarma Sajal Sharma Vinil Narang
  • 2. Topic Modeling  A Topic Model is a statistical model for discovering the abstract "topics" that occur in a collection of documents.
  • 3. LDA(Latent Dirichlet Allocation)  Topic Modeling represents documents as mixtures of topics that spit out words with certain probabilities.  LDA is the most common topic model currently in use.  LDA is used in social platforms to come up with a model which can predict on which topics related document a user prefers to read and write.
  • 4. LDA …  LDA is a method which considers documents beyond mere just bag of words.  It describes a generative process whereby, given a Dirichlet conditioned bag filled with topic-distributions for each document, we draw a topic mixture from this bag. Then, we repeatedly draw both a topic and then a word from that topic to generate the words in that document.  we have a generative model that represents the process by which a corpora of documents were created in terms of Topics.
  • 5. Approaches for topic modeling  Approach1: Finding LDA model for each user in the network.  Approach2: Finding top K influential users and applying the LDA model on these users only.  Approach3: Finding communities present in the network and approximating users topics by applying LDA model of its corresponding community.
  • 6. Comparing different approaches.  Approach1:  Calculating LDA for each user is very costly since the data set is very large.  So this approach is not feasible for practical purposes.
  • 7. Comparing different approaches.  Approach 2:  Since LDA can’t be applied to each user, therefore we have selected top k users on which we have applied LDA.  The top k users were found out using Page Rank algorithm.  Then we extracted the tweets for these k users and applied LDA on them.  For this project we followed approach 2.
  • 8. Approach 2…  1. Step 1:  Input : A file containing the follower-followee relationship in the form of edge list. It contains ~6.25 crore nodes(users) and ~146 crore edges(follower-followee relationship)  Output : A file containing the userid and pageranks of all the nodes(users) in the graph.  Approach : Used GraphChi’s pagerank algorithm to perform the above mentioned task.    2. Step 2:  Input : A file containing the pageranks and userids of all the nodes(users) in the graph.  Output : A file containing the pageranks and userids of all the users sorted on the basis of pagerank.  Approach : External merge sort is used to perform this kind of sorting because the size of the file to be sorted was 30GB. 
  • 9. Approach 2…  3. Step 3:  Input : A file containing the pageranks and userids of all the nodes sorted by pagerank.  Output : A file containing the top 50 users(userids and pagerank)  4. Step 4:  Input : A file containing the top 50 users.  Output : The tweets of these top 50 users.  Approach : We used the tweet crawler to crawl the tweets of these users.  5. Step 5:  Input : 50 files containing tweets of top 50 users.  Output : LDA models of these 50 users.  Approach : Used Mallet to perform this task.
  • 11. Problems:  We encountered a case where all the influential elements could belong to a single community.  Then LDA on the top k influential elements would not result in a generic model, hence this technique is not robust and effective for this scenarios.  Hence we had to move on to the next approach of community detection before applying the LDA.
  • 12. Approach 3… and future works :  Since Approach 2 was not giving expected results, so we tried Community Detection approach.  The GraphChi Community Detection code is still in early phase of development and is not giving expected results.  There is another API named Infomap which tends to work well for small graphs but the code burst for large graphs.  We are still working on approach 3 and aiming for better accuracy and results.