SlideShare una empresa de Scribd logo
1 de 48
Descargar para leer sin conexión
Harnessing Twitter to Support
Serendipitous Learning of Developers
Abhishek Sharma1, Yuan Tian1, Agus Sulistya1, David Lo1
and Aiko Fallas Yamashita2
1School of Information Systems,
Singapore Management University
2Oslo and Akershus University, Norway
24th IEEE International Conference on Software Analysis,
Evolution, and Reengineering (SANER 2017)
• Keeping up to date a big challenge
(Storey et al. TSE’16)
Developer Challenges?
2
Why Twitter for Learning
• Keeping up to date a big challenge
(Storey et al. TSE’16)
• Twitter is used by software
developers to share important
information (Tian et al. MSR’12)
2
https://unsplash.com/photos/HAIPJ8PyeL8
Why Twitter for Learning
• Keeping up to date a big challenge
(Storey et al. TSE’16)
• Twitter is used by software
developers to share important
information (Tian et al. MSR’12)
• Twitter enables serendipitous
(pleasant and undirected) learning
for developers (Singer et al.
ICSE’14)
2
https://unsplash.com/photos/HAIPJ8PyeL8
Challenges
• Finding useful articles not easy
3
Challenges
• Finding useful articles not easy
• Developers need to identify
– many relevant Twitter users to follow
– sieve through a large amount of
tweets/URLs
3
Challenges
• Finding useful articles not easy
• Developers need to identify
– many relevant Twitter users to follow
– sieve through a large amount of
tweets/URLs
Singer et al. ICSE’14
3
Challenges
• Finding useful articles not easy
• Developers need to identify
– many relevant Twitter users to follow
– sieve through a large amount of
tweets/URLs
Singer et al. ICSE’14
• Too much information can make learning using Twitter an
unpleasant experience
3
https://unsplash.com/photos/yD5rv8_WzxA
This Study
• Can we automatically extract popular and relevant URLs
from Twitter for developers
• In this work, we:
• propose 14 features to characterize a URL
• evaluate a supervised and unsupervised approach to
recommend URLs harvested from Twitter
4
Methodology (1): Collecting Seed Data
5
Methodology (1): Collecting Seed Data
• Get a list of seed twitter users
5
http://www.noop.nl/2009/02/twitter-top-100-for-softwaredevelopers.htm
Methodology (1): Collecting Seed Data
• Get a list of seed twitter users
5
http://www.noop.nl/2009/02/twitter-top-100-for-softwaredevelopers.htm
Methodology (1): Collecting Seed Data
• Get a list of seed twitter users
• Get a larger set of people who
– Follow (or are followed by) >= 5 seed users
– Results in 85,171 Twitter users
5
Methodology (1): Collecting Seed Data
• Get a list of seed twitter users
• Get a larger set of people who
– Follow (or are followed by) >= 5 seed users
– Results in 85,171 Twitter users
• Collect tweets generated by these users for 1 month
period (Nov’ 15)
5
Methodology (2): URL Extraction
615
Methodology (2): URL Extraction
• Find tweets which contain keyword “java” (2,104 tweets)
616
Methodology (2): URL Extraction
• Find tweets which contain keyword “java” (2,104 tweets)
• Find tweets which contain an URL (1,606 tweets)
617
https://t.co/
https://b.ly/
https://go.cl
Methodology (2): URL Extraction
• Find tweets which contain keyword “java” (2,104 tweets)
• Find tweets which contain an URL (1,606 tweets)
• Extract URLs
http://ow.ly/UIxwS
http://bit.ly/1OFsZSj
http://goo.gl/IGxGlo
https://t.co/ryPI3
618
https://t.co/
https://b.ly/
https://go.cl
Methodology (2): URL Extraction
• Find tweets which contain keyword “java” (2,104 tweets)
• Find tweets which contain an URL (1,606 tweets)
• Extract URLs
• Expand short URLs (770 expanded URLs)
http://abc.com
http://xyz.com
http://abc.com
http://xyz.com
619
https://t.co/
https://b.ly/
https://go.cl
Methodology (2): URL Extraction
• Find tweets which contain keyword “java” (2,104 tweets)
• Find tweets which contain an URL (1,606 tweets)
• Extract URLs
• Expand short URLs (770 expanded URLs)
• Resolve duplicate/broken URLs (577)
http://abc.com
http://xyz.com
620
https://t.co/
https://b.ly/
https://go.cl
Methodology (3): Feature Extraction
• 14 features extracted
– Content
– Popularity
– Network
7
Methodology (3): Feature Extraction
• Content
8
Methodology (3): Feature Extraction
• Content
– cosine similarity between
keyword and
8
Methodology (3): Feature Extraction
• Content
– cosine similarity between
keyword and
• tweet text (CosSimT)
8
Methodology (3): Feature Extraction
• Content
– cosine similarity between
keyword and
• tweet text (CosSimT)
• user profile text (CosSimP)
8
Methodology (3): Feature Extraction
• Content
– cosine similarity between
keyword and
• tweet text (CosSimT)
• user profile text (CosSimP)
• webpage text (CosSimW)
8
Methodology (3): Feature Extraction
– Network
9
Methodology (3): Feature Extraction
– Network
• estimate importance of
users through
– centrality scores
– page rank
9
– Network
• estimate importance of
users through
– centrality scores
– page rank
9
Methodology (3): Feature Extraction
– Network
• estimate importance of
users through
– centrality scores
– page rank
– Popularity
• number of times the
tweets containing the
URL were
9
Methodology (3): Feature Extraction
– Network
• estimate importance of
users through
– centrality scores
– page rank
– Popularity
• number of times the
tweets containing the
URL were
– retweeted
9
Methodology (3): Feature Extraction
– Network
• estimate importance of
users through
– centrality scores
– page rank
– Popularity
• number of times the
tweets containing the
URL were
– retweeted
– liked
9
Methodology (3): Feature Extraction
Methodology (4): Labelling the URLs
• Labelled independently by
– 2 persons having having more than 4 years of professional
programming experience in Java
– one a PhD student and another a Research Engineer
10
Methodology (4): Labelling the URLs
• Labelled independently by
– 2 persons having having more than 4 years of professional
programming experience in Java
– one a PhD student and another a Research Engineer
• Both persons sat together to resolve disagreements
10
Methodology (4): Labelling the URLs
• Labelled independently by
– 2 persons having having more than 4 years of professional
programming experience in Java
– one a PhD student and another a Research Engineer
• Both persons sat together to resolve disagreements
• URLs assigned relevance scores from 0-3
10
Methodology (5): Recommendation
• Unsupervised –Borda Count
– assigns ranking points for each feature score for an
URL and then combines the scores
11
• Supervised –Learning to Rank
– learns a ranking function based on the weighted sum
of features of an URL
RQ1: Effectiveness of Our Approach
12
• NDCG (Normalized Discounted Cumulative Gain)
• Measures the capability to recommend higher ranked URLs at
top ranks
• Score closer to 1 specifies better performance with the range
of scores being 0-1
RQ1: Effectiveness of Our Approach
12
0.832
0.719
0
0.2
0.4
0.6
0.8
1
Supervised Unsupervised
NDCGScore
Recommendation Approach
• NDCG (Normalized Discounted Cumulative Gain)
• Measures the capability to recommend higher ranked URLs at
top ranks
• Score closer to 1 specifies better performance with the range
of scores being 0-1
RQ2: Sensitivity of Supervised
Approach to Training Data
13
0.832
0.825
0.833
0.845
0.834
0.842
0.837
0.847
0.843
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10
9
8
7
6
5
4
3
2
NDCG Score
k(nooffoldsused)
Threats to Validity
14
Threats to Validity
• Subjectivity in the labelling process
14
Threats to Validity
• Subjectivity in the labelling process
– asked 2 persons to label independently
14
Threats to Validity
• Subjectivity in the labelling process
– asked 2 persons to label independently
• Only 1 domain
14
Threats to Validity
• Subjectivity in the labelling process
– asked 2 persons to label independently
• Only 1 domain
– evaluate more domains in future work
14
Threats to Validity
• Subjectivity in the labelling process
– asked 2 persons to label independently
• Only 1 domain
– evaluate more domains in future work
• Suitability of evaluation metric
14
Threats to Validity
• Subjectivity in the labelling process
– asked 2 persons to label independently
• Only 1 domain
– evaluate more domains in future work
• Suitability of evaluation metric
– used NDCG which is a standard metric
14
Conclusion and Future Work
• Supervised and unsupervised approaches
show promise in recommending URLs
• Future work:
– Automatically categorize the recommended
URLs
– Build an automated system to recommend
relevant URLs
15
Feedback/Advice
• What additional resources we can
consider for mining URLs?
• How to infer developer interests
automatically?
Thank you!

Más contenido relacionado

Similar a Saner17 sharma

Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentationTao Feng
 
Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Sonya Liberman
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentationTao Feng
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...Yongyao Jiang
 
Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019Sonya Liberman
 
Overview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search EditionOverview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search Editionkrisztianbalog
 
IT "The Power That Influence The World"
IT "The Power That Influence The World"IT "The Power That Influence The World"
IT "The Power That Influence The World"USA Discussion Group
 
Multi-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsMulti-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsAravind Sesagiri Raamkumar
 
User Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge BaseUser Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge BasePavan Kapanipathi
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning ClusteringFEG
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVIntoTheMinds
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVFrancisco Couto
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationTamikaTannis
 
Developing rich interactive eBooks to teach linked open data to professionals...
Developing rich interactive eBooks to teach linked open data to professionals...Developing rich interactive eBooks to teach linked open data to professionals...
Developing rich interactive eBooks to teach linked open data to professionals...John Domingue
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryNeo4j
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Lippo Group Digital
 
Using Feedback from Data Consumers to Capture Quality Information on Environm...
Using Feedback from Data Consumers to Capture Quality Information on Environm...Using Feedback from Data Consumers to Capture Quality Information on Environm...
Using Feedback from Data Consumers to Capture Quality Information on Environm...Anusuriya Devaraju
 

Similar a Saner17 sharma (20)

Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
 
Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019
 
Overview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search EditionOverview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search Edition
 
Beyond User Research
Beyond User ResearchBeyond User Research
Beyond User Research
 
IT "The Power That Influence The World"
IT "The Power That Influence The World"IT "The Power That Influence The World"
IT "The Power That Influence The World"
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
Multi-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsMulti-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender Systems
 
User Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge BaseUser Interests Identification From Twitter using Hierarchical Knowledge Base
User Interests Identification From Twitter using Hierarchical Knowledge Base
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning Clustering
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
 
Developing rich interactive eBooks to teach linked open data to professionals...
Developing rich interactive eBooks to teach linked open data to professionals...Developing rich interactive eBooks to teach linked open data to professionals...
Developing rich interactive eBooks to teach linked open data to professionals...
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
 
Using Feedback from Data Consumers to Capture Quality Information on Environm...
Using Feedback from Data Consumers to Capture Quality Information on Environm...Using Feedback from Data Consumers to Capture Quality Information on Environm...
Using Feedback from Data Consumers to Capture Quality Information on Environm...
 

Último

Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oManavSingh202607
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxBhagirath Gogikar
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicinesherlingomez2
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 

Último (20)

Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 

Saner17 sharma

  • 1. Harnessing Twitter to Support Serendipitous Learning of Developers Abhishek Sharma1, Yuan Tian1, Agus Sulistya1, David Lo1 and Aiko Fallas Yamashita2 1School of Information Systems, Singapore Management University 2Oslo and Akershus University, Norway 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2017)
  • 2. • Keeping up to date a big challenge (Storey et al. TSE’16) Developer Challenges? 2
  • 3. Why Twitter for Learning • Keeping up to date a big challenge (Storey et al. TSE’16) • Twitter is used by software developers to share important information (Tian et al. MSR’12) 2 https://unsplash.com/photos/HAIPJ8PyeL8
  • 4. Why Twitter for Learning • Keeping up to date a big challenge (Storey et al. TSE’16) • Twitter is used by software developers to share important information (Tian et al. MSR’12) • Twitter enables serendipitous (pleasant and undirected) learning for developers (Singer et al. ICSE’14) 2 https://unsplash.com/photos/HAIPJ8PyeL8
  • 5. Challenges • Finding useful articles not easy 3
  • 6. Challenges • Finding useful articles not easy • Developers need to identify – many relevant Twitter users to follow – sieve through a large amount of tweets/URLs 3
  • 7. Challenges • Finding useful articles not easy • Developers need to identify – many relevant Twitter users to follow – sieve through a large amount of tweets/URLs Singer et al. ICSE’14 3
  • 8. Challenges • Finding useful articles not easy • Developers need to identify – many relevant Twitter users to follow – sieve through a large amount of tweets/URLs Singer et al. ICSE’14 • Too much information can make learning using Twitter an unpleasant experience 3 https://unsplash.com/photos/yD5rv8_WzxA
  • 9. This Study • Can we automatically extract popular and relevant URLs from Twitter for developers • In this work, we: • propose 14 features to characterize a URL • evaluate a supervised and unsupervised approach to recommend URLs harvested from Twitter 4
  • 11. Methodology (1): Collecting Seed Data • Get a list of seed twitter users 5 http://www.noop.nl/2009/02/twitter-top-100-for-softwaredevelopers.htm
  • 12. Methodology (1): Collecting Seed Data • Get a list of seed twitter users 5 http://www.noop.nl/2009/02/twitter-top-100-for-softwaredevelopers.htm
  • 13. Methodology (1): Collecting Seed Data • Get a list of seed twitter users • Get a larger set of people who – Follow (or are followed by) >= 5 seed users – Results in 85,171 Twitter users 5
  • 14. Methodology (1): Collecting Seed Data • Get a list of seed twitter users • Get a larger set of people who – Follow (or are followed by) >= 5 seed users – Results in 85,171 Twitter users • Collect tweets generated by these users for 1 month period (Nov’ 15) 5
  • 15. Methodology (2): URL Extraction 615
  • 16. Methodology (2): URL Extraction • Find tweets which contain keyword “java” (2,104 tweets) 616
  • 17. Methodology (2): URL Extraction • Find tweets which contain keyword “java” (2,104 tweets) • Find tweets which contain an URL (1,606 tweets) 617 https://t.co/ https://b.ly/ https://go.cl
  • 18. Methodology (2): URL Extraction • Find tweets which contain keyword “java” (2,104 tweets) • Find tweets which contain an URL (1,606 tweets) • Extract URLs http://ow.ly/UIxwS http://bit.ly/1OFsZSj http://goo.gl/IGxGlo https://t.co/ryPI3 618 https://t.co/ https://b.ly/ https://go.cl
  • 19. Methodology (2): URL Extraction • Find tweets which contain keyword “java” (2,104 tweets) • Find tweets which contain an URL (1,606 tweets) • Extract URLs • Expand short URLs (770 expanded URLs) http://abc.com http://xyz.com http://abc.com http://xyz.com 619 https://t.co/ https://b.ly/ https://go.cl
  • 20. Methodology (2): URL Extraction • Find tweets which contain keyword “java” (2,104 tweets) • Find tweets which contain an URL (1,606 tweets) • Extract URLs • Expand short URLs (770 expanded URLs) • Resolve duplicate/broken URLs (577) http://abc.com http://xyz.com 620 https://t.co/ https://b.ly/ https://go.cl
  • 21. Methodology (3): Feature Extraction • 14 features extracted – Content – Popularity – Network 7
  • 22. Methodology (3): Feature Extraction • Content 8
  • 23. Methodology (3): Feature Extraction • Content – cosine similarity between keyword and 8
  • 24. Methodology (3): Feature Extraction • Content – cosine similarity between keyword and • tweet text (CosSimT) 8
  • 25. Methodology (3): Feature Extraction • Content – cosine similarity between keyword and • tweet text (CosSimT) • user profile text (CosSimP) 8
  • 26. Methodology (3): Feature Extraction • Content – cosine similarity between keyword and • tweet text (CosSimT) • user profile text (CosSimP) • webpage text (CosSimW) 8
  • 27. Methodology (3): Feature Extraction – Network 9
  • 28. Methodology (3): Feature Extraction – Network • estimate importance of users through – centrality scores – page rank 9
  • 29. – Network • estimate importance of users through – centrality scores – page rank 9 Methodology (3): Feature Extraction
  • 30. – Network • estimate importance of users through – centrality scores – page rank – Popularity • number of times the tweets containing the URL were 9 Methodology (3): Feature Extraction
  • 31. – Network • estimate importance of users through – centrality scores – page rank – Popularity • number of times the tweets containing the URL were – retweeted 9 Methodology (3): Feature Extraction
  • 32. – Network • estimate importance of users through – centrality scores – page rank – Popularity • number of times the tweets containing the URL were – retweeted – liked 9 Methodology (3): Feature Extraction
  • 33. Methodology (4): Labelling the URLs • Labelled independently by – 2 persons having having more than 4 years of professional programming experience in Java – one a PhD student and another a Research Engineer 10
  • 34. Methodology (4): Labelling the URLs • Labelled independently by – 2 persons having having more than 4 years of professional programming experience in Java – one a PhD student and another a Research Engineer • Both persons sat together to resolve disagreements 10
  • 35. Methodology (4): Labelling the URLs • Labelled independently by – 2 persons having having more than 4 years of professional programming experience in Java – one a PhD student and another a Research Engineer • Both persons sat together to resolve disagreements • URLs assigned relevance scores from 0-3 10
  • 36. Methodology (5): Recommendation • Unsupervised –Borda Count – assigns ranking points for each feature score for an URL and then combines the scores 11 • Supervised –Learning to Rank – learns a ranking function based on the weighted sum of features of an URL
  • 37. RQ1: Effectiveness of Our Approach 12 • NDCG (Normalized Discounted Cumulative Gain) • Measures the capability to recommend higher ranked URLs at top ranks • Score closer to 1 specifies better performance with the range of scores being 0-1
  • 38. RQ1: Effectiveness of Our Approach 12 0.832 0.719 0 0.2 0.4 0.6 0.8 1 Supervised Unsupervised NDCGScore Recommendation Approach • NDCG (Normalized Discounted Cumulative Gain) • Measures the capability to recommend higher ranked URLs at top ranks • Score closer to 1 specifies better performance with the range of scores being 0-1
  • 39. RQ2: Sensitivity of Supervised Approach to Training Data 13 0.832 0.825 0.833 0.845 0.834 0.842 0.837 0.847 0.843 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 9 8 7 6 5 4 3 2 NDCG Score k(nooffoldsused)
  • 41. Threats to Validity • Subjectivity in the labelling process 14
  • 42. Threats to Validity • Subjectivity in the labelling process – asked 2 persons to label independently 14
  • 43. Threats to Validity • Subjectivity in the labelling process – asked 2 persons to label independently • Only 1 domain 14
  • 44. Threats to Validity • Subjectivity in the labelling process – asked 2 persons to label independently • Only 1 domain – evaluate more domains in future work 14
  • 45. Threats to Validity • Subjectivity in the labelling process – asked 2 persons to label independently • Only 1 domain – evaluate more domains in future work • Suitability of evaluation metric 14
  • 46. Threats to Validity • Subjectivity in the labelling process – asked 2 persons to label independently • Only 1 domain – evaluate more domains in future work • Suitability of evaluation metric – used NDCG which is a standard metric 14
  • 47. Conclusion and Future Work • Supervised and unsupervised approaches show promise in recommending URLs • Future work: – Automatically categorize the recommended URLs – Build an automated system to recommend relevant URLs 15
  • 48. Feedback/Advice • What additional resources we can consider for mining URLs? • How to infer developer interests automatically? Thank you!