Crowdsourcing Documentation in Software Engineering
1. Crowdsourcing Documentation
in Software Engineering
Margaret-Anne (Peggy) Storey
ICSE 2014 1st International Workshop on
Crowdsourcing in Software Engineering
2. Christoph Treude
Brendan Cleary
Fernando Figueira Filho
Jamie Starke
Gargi Bougie
Peter Rigby
Lars Grammel
Leif Singer
Laura MacLeod
Daniel German
Alexey Zagalsky
Chris Parnin, Georgia Tech
Ohad Barzilay, Tel-Aviv University, Israel
Arie van Deursen, TU Delft, the Netherlands
Li-Te Cheng, IBM Research
Ian Bull, Eclipsesource
Acknowledgements
3. “Documentation is the castor oil of
software development”
Gerald Weinberg, Psychology of Computer
Programming 1975
6. Documentation rationale…
To replace communication
To specify a contract with partners
To provide organizational memory
To reflect
To seek feedback
For the public good! [Wasko et al.]
9. Crowdsourcing…
“…obtaining needed services, ideas, or content by soliciting
contributions from a large group of people, and especially from
an online community, rather than from traditional employees or
suppliers… the work comes from an undefined public rather
than being commissioned from a specific, named group…
Explicit crowdsourcing lets users work together to evaluate, share
and build different specific tasks, while implicit crowdsourcing
means that users solve a problem as a side effect of something
else they are doing.” [Wikipedia, June 1, 2014]
10. Community versus crowd
contributions?
Individual or team contributions
(e.g. design documents, podcasts)
Community contributions: created by a few
(e.g. translation efforts)
Crowdsourcing contributions: many small
contributions that add value
(e.g. views, likes, comments, tags, votes)
11. Social production [Yochai Benkler]
Industrial revolution, high costs to access broadcast media
Low cost distributed small contributions at scale
Not just turning levers but adding wisdom, creativity
Not a fad!
Critical long term shift caused by the internet
12. Social media as a disruptive force:
an enabler for crowdsourcing
Enhancing the participatory culture in
software development and in software
documentation
Storey, M.-A., L. Singer, F. Figueira Filho, B. Cleary and A. Zagalsky,
The (R)evolutionary Role of Social Media in Software Engineering,
ICSE 2014 Future of Software Engineering Track), Hyderabad, 2014.
13. Social Media Channels for
Software Documentation
Community
Portal
Tagging
Microblogging
Question &
Answer Websites
Videos,
podcasts
Blogging
Wikis
14. Outline of the rest of this talk
Some insights on how social media channels
can support “crowdsourced”
documentation in software development
Discussion
17. Wikis and software documentation
Used extensively (requirements, design,
planning), integrated with many tools
Some shortcomings:
lack of authoritativeness
[Dagenais and Robillard FSE 2010]
Designed by Ward Cunningham in 1994
21. TagSEA: Tagging Waypoints
in source code and gathering into Tours
M.-A. Storey, J. Ryall, J. Singer, D. Myers, L.-T. Cheng, M. Muller, 2009.
How Software Developers Use Tagging to Support Reminding and Refinding. IEEE
Transactions on Software Engineering (TSE), 2009.
22. Tagging in
Studied introduction and adoption of tags by
several teams for work items
C. Treude and M.-A. Storey. Work Item Tagging: Communicating Concerns in
Collaborative Software Development. In IEEE Transactions on Software Engineering 38, 1
(January/February 2012). pp. 19-34.
27. Microblogging
Software engineers tweet actively (share) facts about
software engineering topics and technology
G. Bougie, J. Starke, M.-A. Storey and D. German. Towards Understanding Twitter Use in Software
Engineering: Preliminary Findings Ongoing Challenges and Future QuestionsIn Proceedings of the
2nd International Workshop on Web 2.0 for Software Engineering. 2011.
28. Survey/Interviews/Survey
Findings:
– Awareness
– Learning
– Relationships
“It was evolving way faster than I was
able to keep up with it. And the only
way to keep up was to follow some
Node.js people on Twitter.”
Leif Singer, Fernando Figueira Filho, Margaret-Anne Storey.
Software Engineering at the Speed of Light: How Developers Stay Current Using Twitter ICSE 2014.
31. Blogging
Determining requirements through blogs
[Park and Maurer, CHASE 2009]
How developers blog: high-level concept
discussion and requirements
[Pagano and Maalej, MSR 2011]
Blogs play a role in documenting APIs
[Treude and Parnin, Web2SE 2011]
Is there potential to increase the size of the
Blogging crowd for software documentation?
35. Over 92% of the questions on
Stackoverflow are answered, and for those
92% the median answer time is 11 minutes
L. Mamykina, B. Manoim, M. Mittal, G. Hripcsak, and B. Hartmann.
Design lessons from the fastest q&a site in the west. CHI 2011.
36. Stackoverflow
How-to questions prevalent, and used frequently
by novices
C. Treude, O. Barzilay and M.-A. Storey. How do Programmers Ask and Answer
Questions on the Web? NIER/ICSE 2011.
37. Linking Stackoverflow data with
API usage
C. Parnin, C. Treude, L. Grammel and M.-A. Storey.
Crowd Documentation: Exploring the Coverage and the Dynamics of API Discussions on Stack
Overflow”. Under submission, blogged (50,000 hits) at http://blog.ninlabs.com/2012/05/crowd-
documentation/ May 2012.
38. Stackoverflow as Crowd Documentation
Coverage of API documentation: 77% of the
Java API classes & 87% of Android API classes
Speed of coverage:
44. Developer motivations?
Documentation! But also …
Reputation: Improves their online persona
Dedication to helping others
“What I wish I had known when I started”
Efficiency
“Throw it up on the internet and forget about it”
http://lmacleod.com/
45. Implications
Many projects use videos to support documentation
and onboarding (e.g. MSDN) so…
How can they be improved for the recipient?
How effective are videos at sharing tacit knowledge?
Tool enhancements? Integration with IDE?
[e.g. Tours]
Cheng, L.-T., M. Desmond and M.-A. Storey, “Presentations by Programmers for
Programmers”, ICSE 2007, IEEE 29th International Conference on Software Engineering.
46. Is this crowdsourcing?
Are code walkthroughs on YouTube effective?
How much do the social features matter?
A social platform for crowd input for video
documentation?
49. Community portals
Stores code and project resources
Provides version control
Hosts web pages
Connects people
Links to communication tools
Records interactions
50. C. Treude and M.-A. Storey. Effective Communication of Software Development
Knowledge Through Community Portals. ESEC/FSE ’11.
51.
52. Implications of different media
Content on wikis is often stale, but useful for
posting information quickly
Blog posts create more buzz or fanfare
Official product documentation is trusted
(review it carefully or rely on the crowd?)
Have an updating process (or crowdsource it?)
Have mechanisms to solicit feedback
(e.g. commenting, blog posts, voting)
53. Social Media Channels to
support Software Documentation
Community
Portal
Tagging
Microblogging
Question &
Answer Websites
Videos,
podcasts
Blogging
Wikis
55. Documentation challenges revisited
Recommenders to aid in discoverability
Keeping up: leverage the crowd
Incentive: participatory culture
Video and podcasts for tacit knowledge
Mining of social media can point to code
examples (implicit mechanism)
56. Discussion points
When does a community become a crowd?
Gaps and nichification?
Incentives? Dynamics?
Study other portals, hubs?
Do these mechanisms translate to industry?
What do you see as challenges, opportunities for
involving the crowd?