SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
What StackOverflow Tells Us
About Programming Languages
Rahul Thankachan Nada Aldarrab
Prathmesh Gat
University of Southern California
1
2
Agenda
Agenda
1. Introduction
2. Problem Statement
3. Temporal Based Trend Analysis
4. Topic Analysis
5. Predicting Time To Answer
6. Summary and Q&A
3
4
Introduction
Introduction
• Dataset Used : Stackoverflow - Internet Archive.
• Why? It is one of the largest developer focused
open collaborative platform currently.
• Through our study we intend to answers some
interesting questions
• Study the rise and fall of popular programming
languages
• Can be used to predict future enhancements
• Study effectiveness of Stack Overflow model
5
Problem Definition
• Basic Analysis: What are the most popular
programming languages?
• What are the trends in programming languages?
• What are the most popular topics discussed in a
programming language?
• Can we accurately predict the time it takes until a
questioner gets an answer?
6
Related Work
Miltiadis Allamanis and Charles Sutton. 2013. Why, When and What: Analyzing Stack Overflow Questions by
Topic, Type & Code In 10th Working Conference on Mining Software Repositories. Mining Challenge. IEEE, pages 53-
56.
• Topic modeling analysis
• Used Latent Dirichlet Allocation (LDA)
• Modeled Java Topics of Questions
• Can evaluate the orthogonality of different languages
• Stack Overflow questions are about the code and are not application domain specific
7
Related Work
V. Bhat, A. Gokhale, R. Jadhav, J. Pudipeddi, and L. Akoglu. Min (e) d your tags: Analysis of question response time in
stackoverflow. In Proceedings of ASONAM 2014, pages 328–335. IEEE, 2014.
• Two linear classifiers: logistic regression and SVM with linear kernel
• Two non- linear classifiers: decision tree (DT) and SVM with radial basis function kernel
8
Related Work
Prediction Accuracy:
9
10
Basic Analysis &
Temporal Trends
StackOverFlow Activity
11
Approx. 45% of all programming languages in world are
discussed on StackOverflow!
Post Type
12
Top Ten Languages
13
Question Count
14
Answer Count
15
Answer Fraction
16
Question Fraction
17
Questioners/Answerers Distribution
18
19
Topic Analysis
Topic Analysis
20
21
Predicting time until
Answer
Approach
Following attributes selected for study:
1. Tag (Only top 10 programming languages)
2. Creation Month
3. Body Length
4. Tag Length
5. Introduced new Nominal class - Time_Answer
6. (less6, bet6and20, 20andmore)
22
Approach
Tools
Weka - Weka is a collection of machine learning
algorithms for data mining tasks.
Data Preprocessing:
Challenging!
23
Approach(Data Pre -processing)
1. Parse all the answers and link first answer’s creation
time to creation time of question. We called this
field delta-answer.
2. Remove all the Questions which had delta answer
negative or zero
3. We developed a Python script which develops .arff
file On the fly (Wish to contribute this file)
24
Evaluation
• Subset Size: 4490947 - Subset - 449000
• Classify response time into 3 types: less than 6
minutes, between 6 and 20 minutes, 20 minutes
and more.
• 10-fold cross-validation
• Results are obtained using different feature
combinations and different classifiers
25
Evaluation
Results of classifier J48 (all Attributes)
26
Evaluation
27
Results of classifier (body_length/ tag_length)
Summary
• We were successfully able to find interesting
temporal trends for major programming languages
• Using tag based topic analysis we were able to find
major discussion topics and to some extent the
difficult topics in a programming language
• Using machine learning techniques we were
successfully able to predict - time to answer with
good accuracy
28
Future Scope of Work
• Contribute the .arff on the fly generator script.
• Adding Parts of speech as an attribute
• Showcasing the results on a website
29
Questions?
30

Más contenido relacionado

La actualidad más candente

Introduction to ChatGPT & how its implemented in UiPath
Introduction to ChatGPT & how its implemented in UiPathIntroduction to ChatGPT & how its implemented in UiPath
Introduction to ChatGPT & how its implemented in UiPathsharonP24
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic webStanley Wang
 
Introduction to software engineering
Introduction to software engineeringIntroduction to software engineering
Introduction to software engineeringHitesh Mohapatra
 
An Introduction to Generative AI - May 18, 2023
An Introduction  to Generative AI - May 18, 2023An Introduction  to Generative AI - May 18, 2023
An Introduction to Generative AI - May 18, 2023CoriFaklaris1
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labelsAlbert Bifet
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - IntroductionChristian Perone
 
How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyPekka Abrahamsson / Tampere University
 
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?Bernard Marr
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
Lecture Note-1: Algorithm and Its Properties
Lecture Note-1: Algorithm and Its PropertiesLecture Note-1: Algorithm and Its Properties
Lecture Note-1: Algorithm and Its PropertiesRajesh K Shukla
 
Compiler Construction | Lecture 1 | What is a compiler?
Compiler Construction | Lecture 1 | What is a compiler?Compiler Construction | Lecture 1 | What is a compiler?
Compiler Construction | Lecture 1 | What is a compiler?Eelco Visser
 
Software development slides
Software development slidesSoftware development slides
Software development slidesiarthur
 
Paraphrase Detection in NLP
Paraphrase Detection in NLPParaphrase Detection in NLP
Paraphrase Detection in NLPYuriy Guts
 

La actualidad más candente (20)

Introduction to ChatGPT & how its implemented in UiPath
Introduction to ChatGPT & how its implemented in UiPathIntroduction to ChatGPT & how its implemented in UiPath
Introduction to ChatGPT & how its implemented in UiPath
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic web
 
Introduction to software engineering
Introduction to software engineeringIntroduction to software engineering
Introduction to software engineering
 
Processing Arabic Text
Processing Arabic TextProcessing Arabic Text
Processing Arabic Text
 
embedded-static-&dynamic
embedded-static-&dynamicembedded-static-&dynamic
embedded-static-&dynamic
 
An Introduction to Generative AI - May 18, 2023
An Introduction  to Generative AI - May 18, 2023An Introduction  to Generative AI - May 18, 2023
An Introduction to Generative AI - May 18, 2023
 
Software Myths
Software MythsSoftware Myths
Software Myths
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labels
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundly
 
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Lecture Note-1: Algorithm and Its Properties
Lecture Note-1: Algorithm and Its PropertiesLecture Note-1: Algorithm and Its Properties
Lecture Note-1: Algorithm and Its Properties
 
Compiler Construction | Lecture 1 | What is a compiler?
Compiler Construction | Lecture 1 | What is a compiler?Compiler Construction | Lecture 1 | What is a compiler?
Compiler Construction | Lecture 1 | What is a compiler?
 
Software development slides
Software development slidesSoftware development slides
Software development slides
 
ODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AIODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AI
 
Text Classification
Text ClassificationText Classification
Text Classification
 
Paraphrase Detection in NLP
Paraphrase Detection in NLPParaphrase Detection in NLP
Paraphrase Detection in NLP
 
software engineering
software engineeringsoftware engineering
software engineering
 
Text MIning
Text MIningText MIning
Text MIning
 

Destacado

Stackoverflow Data Analysis-Homework3
Stackoverflow Data Analysis-Homework3Stackoverflow Data Analysis-Homework3
Stackoverflow Data Analysis-Homework3Ayush Tak
 
StackOverflow Architectural Overview
StackOverflow Architectural OverviewStackOverflow Architectural Overview
StackOverflow Architectural OverviewFolio3 Software
 
Analyzing Stack Overflow - Problem
Analyzing Stack Overflow - ProblemAnalyzing Stack Overflow - Problem
Analyzing Stack Overflow - ProblemAmrith Krishna
 
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)Ontico
 
Software Engineering and Social media
Software Engineering and Social mediaSoftware Engineering and Social media
Software Engineering and Social mediaJorge Melegati
 
Towards the Social Programmer (MSR 2012 Keynote by M. Storey)
Towards the Social Programmer (MSR 2012 Keynote by M. Storey)Towards the Social Programmer (MSR 2012 Keynote by M. Storey)
Towards the Social Programmer (MSR 2012 Keynote by M. Storey)Margaret-Anne Storey
 
Implementación Repositorio De Objetos De Aprendizajes Basado En
Implementación Repositorio De Objetos De Aprendizajes Basado EnImplementación Repositorio De Objetos De Aprendizajes Basado En
Implementación Repositorio De Objetos De Aprendizajes Basado Enf.cabrera1
 
Soluciones tecnológicas para REA
Soluciones tecnológicas para REASoluciones tecnológicas para REA
Soluciones tecnológicas para REARicardo Corai
 
What is Node.js used for: The 2015 Node.js Overview Report
What is Node.js used for: The 2015 Node.js Overview ReportWhat is Node.js used for: The 2015 Node.js Overview Report
What is Node.js used for: The 2015 Node.js Overview ReportGabor Nagy
 
Presentacion MoodleMoot 2014 Colombia - Integración Moodle con un Repositorio...
Presentacion MoodleMoot 2014 Colombia - Integración Moodle con un Repositorio...Presentacion MoodleMoot 2014 Colombia - Integración Moodle con un Repositorio...
Presentacion MoodleMoot 2014 Colombia - Integración Moodle con un Repositorio...Paola Amadeo
 
Responsive Design
Responsive DesignResponsive Design
Responsive DesignMRMtech
 
Modern HTML & CSS Coding: Speed, Semantics & Structure
Modern HTML & CSS Coding: Speed, Semantics & StructureModern HTML & CSS Coding: Speed, Semantics & Structure
Modern HTML & CSS Coding: Speed, Semantics & StructureRaven Tools
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop OverviewShubhra Kar
 
What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?Matt Wood
 
Collaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumentiCollaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumentiFilippo Lanubile
 

Destacado (20)

Understanding Stack Overflow
Understanding Stack OverflowUnderstanding Stack Overflow
Understanding Stack Overflow
 
Stackoverflow Data Analysis-Homework3
Stackoverflow Data Analysis-Homework3Stackoverflow Data Analysis-Homework3
Stackoverflow Data Analysis-Homework3
 
StackOverflow Architectural Overview
StackOverflow Architectural OverviewStackOverflow Architectural Overview
StackOverflow Architectural Overview
 
Analyzing Stack Overflow - Problem
Analyzing Stack Overflow - ProblemAnalyzing Stack Overflow - Problem
Analyzing Stack Overflow - Problem
 
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)
Stack Overflow - It's all about performance / Marco Cecconi (Stack Overflow)
 
Software Engineering and Social media
Software Engineering and Social mediaSoftware Engineering and Social media
Software Engineering and Social media
 
Towards the Social Programmer (MSR 2012 Keynote by M. Storey)
Towards the Social Programmer (MSR 2012 Keynote by M. Storey)Towards the Social Programmer (MSR 2012 Keynote by M. Storey)
Towards the Social Programmer (MSR 2012 Keynote by M. Storey)
 
Repositorio Institucional para el manejo de Investigaciones de la UNAN-Manag...
 Repositorio Institucional para el manejo de Investigaciones de la UNAN-Manag... Repositorio Institucional para el manejo de Investigaciones de la UNAN-Manag...
Repositorio Institucional para el manejo de Investigaciones de la UNAN-Manag...
 
groovy & grails - lecture 13
groovy & grails - lecture 13groovy & grails - lecture 13
groovy & grails - lecture 13
 
Implementación Repositorio De Objetos De Aprendizajes Basado En
Implementación Repositorio De Objetos De Aprendizajes Basado EnImplementación Repositorio De Objetos De Aprendizajes Basado En
Implementación Repositorio De Objetos De Aprendizajes Basado En
 
Soluciones tecnológicas para REA
Soluciones tecnológicas para REASoluciones tecnológicas para REA
Soluciones tecnológicas para REA
 
What is Node.js used for: The 2015 Node.js Overview Report
What is Node.js used for: The 2015 Node.js Overview ReportWhat is Node.js used for: The 2015 Node.js Overview Report
What is Node.js used for: The 2015 Node.js Overview Report
 
Presentacion MoodleMoot 2014 Colombia - Integración Moodle con un Repositorio...
Presentacion MoodleMoot 2014 Colombia - Integración Moodle con un Repositorio...Presentacion MoodleMoot 2014 Colombia - Integración Moodle con un Repositorio...
Presentacion MoodleMoot 2014 Colombia - Integración Moodle con un Repositorio...
 
Responsive Design
Responsive DesignResponsive Design
Responsive Design
 
Stack_Overflow-Network_Graph
Stack_Overflow-Network_GraphStack_Overflow-Network_Graph
Stack_Overflow-Network_Graph
 
Modern HTML & CSS Coding: Speed, Semantics & Structure
Modern HTML & CSS Coding: Speed, Semantics & StructureModern HTML & CSS Coding: Speed, Semantics & Structure
Modern HTML & CSS Coding: Speed, Semantics & Structure
 
StrongLoop Overview
StrongLoop OverviewStrongLoop Overview
StrongLoop Overview
 
What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?What can Bioinformaticians learn from YouTube?
What can Bioinformaticians learn from YouTube?
 
Kaggle's WISE 2014 challenge
Kaggle's WISE 2014 challenge Kaggle's WISE 2014 challenge
Kaggle's WISE 2014 challenge
 
Collaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumentiCollaborazione nelle comunità open source: tecniche e strumenti
Collaborazione nelle comunità open source: tecniche e strumenti
 

Similar a Stack Overflow slides Data Analytics

1. course introduction
1. course introduction1. course introduction
1. course introductionSaeed Parsa
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesTao Xie
 
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...Preetha Chatterjee
 
Automatic Identification of Informative Code in Stack Overflow Posts
Automatic Identification of Informative Code in Stack Overflow PostsAutomatic Identification of Informative Code in Stack Overflow Posts
Automatic Identification of Informative Code in Stack Overflow PostsPreetha Chatterjee
 
Introduction.pptx
Introduction.pptxIntroduction.pptx
Introduction.pptxSamar954063
 
Information-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic DataInformation-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic DataSteffen Staab
 
Asms app planning
Asms app planningAsms app planning
Asms app planningmajapamaya
 
Devry CIS 355A Full Course Latest
Devry CIS 355A Full Course LatestDevry CIS 355A Full Course Latest
Devry CIS 355A Full Course LatestAtifkhilji
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaopenseesdays
 
Are High Level Programming Languages for Multicore and Safety Critical Conver...
Are High Level Programming Languages for Multicore and Safety Critical Conver...Are High Level Programming Languages for Multicore and Safety Critical Conver...
Are High Level Programming Languages for Multicore and Safety Critical Conver...InfinIT - Innovationsnetværket for it
 
Introduction to course
Introduction to courseIntroduction to course
Introduction to coursenikit meshram
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
 
Seamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuitySeamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuitySteffen Staab
 
Texas.gov Presents: Battle of Programming Languages
Texas.gov Presents:  Battle of Programming LanguagesTexas.gov Presents:  Battle of Programming Languages
Texas.gov Presents: Battle of Programming LanguagesTexas.gov
 
Visualising the world of competitive programming with Python (Codeforces)
Visualising the world of competitive programming with Python (Codeforces)Visualising the world of competitive programming with Python (Codeforces)
Visualising the world of competitive programming with Python (Codeforces)Anuj Menta
 

Similar a Stack Overflow slides Data Analytics (20)

1. course introduction
1. course introduction1. course introduction
1. course introduction
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
 
Automatic Identification of Informative Code in Stack Overflow Posts
Automatic Identification of Informative Code in Stack Overflow PostsAutomatic Identification of Informative Code in Stack Overflow Posts
Automatic Identification of Informative Code in Stack Overflow Posts
 
01.intro
01.intro01.intro
01.intro
 
Programming for Problem Solving
Programming for Problem SolvingProgramming for Problem Solving
Programming for Problem Solving
 
Introduction.pptx
Introduction.pptxIntroduction.pptx
Introduction.pptx
 
Information-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic DataInformation-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic Data
 
Asms app planning
Asms app planningAsms app planning
Asms app planning
 
Devry CIS 355A Full Course Latest
Devry CIS 355A Full Course LatestDevry CIS 355A Full Course Latest
Devry CIS 355A Full Course Latest
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
 
Are High Level Programming Languages for Multicore and Safety Critical Conver...
Are High Level Programming Languages for Multicore and Safety Critical Conver...Are High Level Programming Languages for Multicore and Safety Critical Conver...
Are High Level Programming Languages for Multicore and Safety Critical Conver...
 
Lec 01 introduction
Lec 01   introductionLec 01   introduction
Lec 01 introduction
 
Introduction to course
Introduction to courseIntroduction to course
Introduction to course
 
6th sem
6th sem6th sem
6th sem
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
Seamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuitySeamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuity
 
A New Hiring Paradigm
A New Hiring ParadigmA New Hiring Paradigm
A New Hiring Paradigm
 
Texas.gov Presents: Battle of Programming Languages
Texas.gov Presents:  Battle of Programming LanguagesTexas.gov Presents:  Battle of Programming Languages
Texas.gov Presents: Battle of Programming Languages
 
Visualising the world of competitive programming with Python (Codeforces)
Visualising the world of competitive programming with Python (Codeforces)Visualising the world of competitive programming with Python (Codeforces)
Visualising the world of competitive programming with Python (Codeforces)
 

Último

Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTSneha Padhiar
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.elesangwon
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...IJAEMSJORNAL
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier Fernández Muñoz
 
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptNoman khan
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHSneha Padhiar
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical trainingGladiatorsKasper
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfShreyas Pandit
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunicationnovrain7111
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProRay Yuan Liu
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESkarthi keyan
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmDeepika Walanjkar
 

Último (20)

Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptx
 
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).ppt
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdf
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunication
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision Pro
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
 

Stack Overflow slides Data Analytics

  • 1. What StackOverflow Tells Us About Programming Languages Rahul Thankachan Nada Aldarrab Prathmesh Gat University of Southern California 1
  • 3. Agenda 1. Introduction 2. Problem Statement 3. Temporal Based Trend Analysis 4. Topic Analysis 5. Predicting Time To Answer 6. Summary and Q&A 3
  • 5. Introduction • Dataset Used : Stackoverflow - Internet Archive. • Why? It is one of the largest developer focused open collaborative platform currently. • Through our study we intend to answers some interesting questions • Study the rise and fall of popular programming languages • Can be used to predict future enhancements • Study effectiveness of Stack Overflow model 5
  • 6. Problem Definition • Basic Analysis: What are the most popular programming languages? • What are the trends in programming languages? • What are the most popular topics discussed in a programming language? • Can we accurately predict the time it takes until a questioner gets an answer? 6
  • 7. Related Work Miltiadis Allamanis and Charles Sutton. 2013. Why, When and What: Analyzing Stack Overflow Questions by Topic, Type & Code In 10th Working Conference on Mining Software Repositories. Mining Challenge. IEEE, pages 53- 56. • Topic modeling analysis • Used Latent Dirichlet Allocation (LDA) • Modeled Java Topics of Questions • Can evaluate the orthogonality of different languages • Stack Overflow questions are about the code and are not application domain specific 7
  • 8. Related Work V. Bhat, A. Gokhale, R. Jadhav, J. Pudipeddi, and L. Akoglu. Min (e) d your tags: Analysis of question response time in stackoverflow. In Proceedings of ASONAM 2014, pages 328–335. IEEE, 2014. • Two linear classifiers: logistic regression and SVM with linear kernel • Two non- linear classifiers: decision tree (DT) and SVM with radial basis function kernel 8
  • 11. StackOverFlow Activity 11 Approx. 45% of all programming languages in world are discussed on StackOverflow!
  • 22. Approach Following attributes selected for study: 1. Tag (Only top 10 programming languages) 2. Creation Month 3. Body Length 4. Tag Length 5. Introduced new Nominal class - Time_Answer 6. (less6, bet6and20, 20andmore) 22
  • 23. Approach Tools Weka - Weka is a collection of machine learning algorithms for data mining tasks. Data Preprocessing: Challenging! 23
  • 24. Approach(Data Pre -processing) 1. Parse all the answers and link first answer’s creation time to creation time of question. We called this field delta-answer. 2. Remove all the Questions which had delta answer negative or zero 3. We developed a Python script which develops .arff file On the fly (Wish to contribute this file) 24
  • 25. Evaluation • Subset Size: 4490947 - Subset - 449000 • Classify response time into 3 types: less than 6 minutes, between 6 and 20 minutes, 20 minutes and more. • 10-fold cross-validation • Results are obtained using different feature combinations and different classifiers 25
  • 26. Evaluation Results of classifier J48 (all Attributes) 26
  • 27. Evaluation 27 Results of classifier (body_length/ tag_length)
  • 28. Summary • We were successfully able to find interesting temporal trends for major programming languages • Using tag based topic analysis we were able to find major discussion topics and to some extent the difficult topics in a programming language • Using machine learning techniques we were successfully able to predict - time to answer with good accuracy 28
  • 29. Future Scope of Work • Contribute the .arff on the fly generator script. • Adding Parts of speech as an attribute • Showcasing the results on a website 29